langtest.datahandler.datasource.SynteticDataset#

class SynteticDataset(dataset: dict, task: TaskManager)#

Bases: BaseDataset

Example dataset class that loads data using the Hugging Face dataset library and also generates synthetic math data.

__init__(dataset: dict, task: TaskManager)#

Initialize the SynteticData class.

Parameters:

dataset (dict) – A dictionary containing dataset information. - data_source (str): Name of the dataset to load. - subset (str, optional): Sub-dataset name (default is ‘sst2’).
task (str) – Task to be evaluated on.

Methods

`__init__`(dataset, task)	Initialize the SynteticData class.
`export_data`(data, output_path)	Export data to a CSV file.
`extract_data_with_equal_proportion`(...)	Extract data with equal proportions from a dictionary.
`load_data`()	Load data based on the specified task.
`load_raw_data`()	Load raw data without any processing.
`load_synthetic_math_data`()	Load synthetic mathematical data for evaluation.
`load_synthetic_nlp_data`()	Load synthetic NLP data for evaluation from HuggingFace library.
`rand_range`(start, end)	Generate a random integer within a specified range.
`replace_values`(prompt, old_to_new)	Replace placeholders in the prompt with new values.

Attributes

`data_sources`
`supported_tasks`

export_data(data: List[Sample], output_path: str)#

Export data to a CSV file.

Parameters:

data (List[Sample]) – A list of Sample objects to export.
output_path (str) – The path to save the CSV file.

static extract_data_with_equal_proportion(data_dict, total_samples)#

Extract data with equal proportions from a dictionary.

Parameters:

data_dict (dict) – A dictionary containing data with labels.
total_samples (int) – The total number of samples to extract.

Returns:

Extracted data with equal label proportions.

Return type:

dict

load_data() → List[Sample]#

Load data based on the specified task.

Returns:: A list of Sample objects containing loaded data.
Return type:: List[Sample]

load_raw_data()#: Load raw data without any processing.

load_synthetic_math_data() → List[Sample]#

Load synthetic mathematical data for evaluation.

Returns:: A list of Sample objects containing loaded data.
Return type:: List[Sample]

load_synthetic_nlp_data() → List[Sample]#

Load synthetic NLP data for evaluation from HuggingFace library.

Returns:: A list of Sample objects containing loaded data.
Return type:: List[Sample]

static rand_range(start: int, end: int) → int#

Generate a random integer within a specified range.

Parameters:

start (int) – The start of the range (inclusive).
end (int) – The end of the range (inclusive).

Returns:

A random integer within the specified range.

Return type:

int

static replace_values(prompt: str, old_to_new: Dict[str, str]) → str#

Replace placeholders in the prompt with new values.

Parameters:

prompt (str) – The prompt containing placeholders to be replaced.
old_to_new (Dict[str, str]) – A dictionary mapping old placeholders to new values.

Returns:

The prompt with placeholders replaced by their respective values.

Return type:

str