langtest.datahandler.datasource.PandasDataset#

class PandasDataset(file_path: str, task: TaskManager, **kwargs)#

Bases: BaseDataset

Class to handle Pandas datasets. Subclass of BaseDataset.

__init__(file_path: str, task: TaskManager, **kwargs) None#

Initializes a PandasDataset object.

Parameters:

file_path – The path to the data file.

Raises:

ValueError – If the specified task is unsupported.

Methods

__init__(file_path, task, **kwargs)

Initializes a PandasDataset object.

export_data(data, output_path)

Exports the data to the corresponding format and saves it to 'output_path'.

load_data()

Load data from a CSV file and preprocess it based on the specified task.

load_raw_data([standardize_columns])

Loads data from a file into raw lists of strings

renamed_extensions([inverted])

Rename the file extensions to the correct format.

Attributes

COLUMN_NAMES

data_sources

supported_tasks

export_data(data: List[Sample], output_path: str)#

Exports the data to the corresponding format and saves it to ‘output_path’.

load_data() List[Sample]#

Load data from a CSV file and preprocess it based on the specified task.

Returns:

A list of preprocessed data samples.

Return type:

List[Sample]

load_raw_data(standardize_columns: bool = False) List[Dict]#

Loads data from a file into raw lists of strings

Parameters:

standardize_columns (bool) – whether to standardize column names

Returns:

parsed file into list of dicts

Return type:

List[Dict]

classmethod renamed_extensions(inverted: bool = False) Dict[str, str]#

Rename the file extensions to the correct format.