langtest.datahandler.datasource.SparkDataset#
- class SparkDataset(file_path: str | dict, task: TaskManager, **kwargs)#
Bases:
BaseDatasetClass to handle Spark datasets. Subclass of BaseDataset.
- __init__(file_path: str | dict, task: TaskManager, **kwargs) None#
Initializes a SparkDataset object.
- Parameters:
file_path (str) – The path to the data file.
task (str) – Task to be evaluated on.
**kwargs –
Methods
__init__(file_path, task, **kwargs)Initializes a SparkDataset object.
export_data(data, output_path)Exports the data to the corresponding format and saves it to 'output_path'.
Load data from a any file and preprocess it based on the specified task.
Load data from a file into raw lists of strings
Attributes
data_sourcesdataset_sizesupported_tasks- export_data(data: List[Sample], output_path: str)#
Exports the data to the corresponding format and saves it to ‘output_path’.
- load_data() List[Sample]#
Load data from a any file and preprocess it based on the specified task.
- Returns:
A list of preprocessed data samples.
- Return type:
List[Sample]
- load_raw_data() List[Dict]#
Load data from a file into raw lists of strings
- Returns:
parsed file into list of dicts
- Return type:
List[Dict]