langtest.datahandler.datasource.ConllDataset#
- class ConllDataset(file_path: str | Dict[str, str], task: TaskManager, **kwargs)#
Bases:
BaseDatasetClass to handle Conll files. Subclass of BaseDataset.
- __init__(file_path: str | Dict[str, str], task: TaskManager, **kwargs) None#
Initializes ConllDataset object.
- Parameters:
file_path (str) – Path to the data file.
task (str) – name of the task to perform
Methods
__init__(file_path, task, **kwargs)Initializes ConllDataset object.
export_data(data, output_path)Exports the data to the corresponding format and saves it to 'output_path'.
Loads data from a CoNLL file.
Loads dataset into a list tokens and labels
Attributes
COLUMN_NAMESdata_sourcesdataset_sizesupported_tasks- export_data(data: List[NERSample], output_path: str)#
Exports the data to the corresponding format and saves it to ‘output_path’.
- Parameters:
data (List[NERSample]) – data to export
output_path (str) – path to save the data to
- load_data() List[NERSample]#
Loads data from a CoNLL file.
- Returns:
List of formatted sentences from the dataset.
- Return type:
List[NERSample]
- load_raw_data() List[Dict]#
Loads dataset into a list tokens and labels
- Returns:
list of dict containing tokens and labels
- Return type:
List[Dict]