langtest.datahandler.datasource.SynteticDataset#
- class SynteticDataset(dataset: dict, task: TaskManager)#
Bases:
BaseDataset
Example dataset class that loads data using the Hugging Face dataset library and also generates synthetic math data.
- __init__(dataset: dict, task: TaskManager)#
Initialize the SynteticData class.
- Parameters:
dataset (dict) – A dictionary containing dataset information. - data_source (str): Name of the dataset to load. - subset (str, optional): Sub-dataset name (default is ‘sst2’).
task (str) – Task to be evaluated on.
Methods
__init__
(dataset, task)Initialize the SynteticData class.
export_data
(data, output_path)Export data to a CSV file.
Extract data with equal proportions from a dictionary.
Load data based on the specified task.
Load raw data without any processing.
Load synthetic mathematical data for evaluation.
Load synthetic NLP data for evaluation from HuggingFace library.
rand_range
(start, end)Generate a random integer within a specified range.
replace_values
(prompt, old_to_new)Replace placeholders in the prompt with new values.
Attributes
data_sources
supported_tasks
- export_data(data: List[Sample], output_path: str)#
Export data to a CSV file.
- Parameters:
data (List[Sample]) – A list of Sample objects to export.
output_path (str) – The path to save the CSV file.
- static extract_data_with_equal_proportion(data_dict, total_samples)#
Extract data with equal proportions from a dictionary.
- Parameters:
data_dict (dict) – A dictionary containing data with labels.
total_samples (int) – The total number of samples to extract.
- Returns:
Extracted data with equal label proportions.
- Return type:
dict
- load_data() List[Sample] #
Load data based on the specified task.
- Returns:
A list of Sample objects containing loaded data.
- Return type:
List[Sample]
- load_raw_data()#
Load raw data without any processing.
- load_synthetic_math_data() List[Sample] #
Load synthetic mathematical data for evaluation.
- Returns:
A list of Sample objects containing loaded data.
- Return type:
List[Sample]
- load_synthetic_nlp_data() List[Sample] #
Load synthetic NLP data for evaluation from HuggingFace library.
- Returns:
A list of Sample objects containing loaded data.
- Return type:
List[Sample]
- static rand_range(start: int, end: int) int #
Generate a random integer within a specified range.
- Parameters:
start (int) – The start of the range (inclusive).
end (int) – The end of the range (inclusive).
- Returns:
A random integer within the specified range.
- Return type:
int
- static replace_values(prompt: str, old_to_new: Dict[str, str]) str #
Replace placeholders in the prompt with new values.
- Parameters:
prompt (str) – The prompt containing placeholders to be replaced.
old_to_new (Dict[str, str]) – A dictionary mapping old placeholders to new values.
- Returns:
The prompt with placeholders replaced by their respective values.
- Return type:
str