langtest.datahandler.datasource.SynteticDataset#

class SynteticDataset(dataset: dict, task: TaskManager)#

Bases: BaseDataset

Example dataset class that loads data using the Hugging Face dataset library and also generates synthetic math data.

__init__(dataset: dict, task: TaskManager)#

Initialize the SynteticData class.

Parameters:
  • dataset (dict) – A dictionary containing dataset information. - data_source (str): Name of the dataset to load. - subset (str, optional): Sub-dataset name (default is ‘sst2’).

  • task (str) – Task to be evaluated on.

Methods

__init__(dataset, task)

Initialize the SynteticData class.

export_data(data, output_path)

Export data to a CSV file.

extract_data_with_equal_proportion(...)

Extract data with equal proportions from a dictionary.

load_data()

Load data based on the specified task.

load_raw_data()

Load raw data without any processing.

load_synthetic_math_data()

Load synthetic mathematical data for evaluation.

load_synthetic_nlp_data()

Load synthetic NLP data for evaluation from HuggingFace library.

rand_range(start, end)

Generate a random integer within a specified range.

replace_values(prompt, old_to_new)

Replace placeholders in the prompt with new values.

Attributes

data_sources

supported_tasks

export_data(data: List[Sample], output_path: str)#

Export data to a CSV file.

Parameters:
  • data (List[Sample]) – A list of Sample objects to export.

  • output_path (str) – The path to save the CSV file.

static extract_data_with_equal_proportion(data_dict, total_samples)#

Extract data with equal proportions from a dictionary.

Parameters:
  • data_dict (dict) – A dictionary containing data with labels.

  • total_samples (int) – The total number of samples to extract.

Returns:

Extracted data with equal label proportions.

Return type:

dict

load_data() List[Sample]#

Load data based on the specified task.

Returns:

A list of Sample objects containing loaded data.

Return type:

List[Sample]

load_raw_data()#

Load raw data without any processing.

load_synthetic_math_data() List[Sample]#

Load synthetic mathematical data for evaluation.

Returns:

A list of Sample objects containing loaded data.

Return type:

List[Sample]

load_synthetic_nlp_data() List[Sample]#

Load synthetic NLP data for evaluation from HuggingFace library.

Returns:

A list of Sample objects containing loaded data.

Return type:

List[Sample]

static rand_range(start: int, end: int) int#

Generate a random integer within a specified range.

Parameters:
  • start (int) – The start of the range (inclusive).

  • end (int) – The end of the range (inclusive).

Returns:

A random integer within the specified range.

Return type:

int

static replace_values(prompt: str, old_to_new: Dict[str, str]) str#

Replace placeholders in the prompt with new values.

Parameters:
  • prompt (str) – The prompt containing placeholders to be replaced.

  • old_to_new (Dict[str, str]) – A dictionary mapping old placeholders to new values.

Returns:

The prompt with placeholders replaced by their respective values.

Return type:

str