langtest.pipelines.utils.data_helpers.ner_dataset.NERDataset#

class NERDataset(tokens: List[List[str]], labels: List[List[str]], tokenizer: PreTrainedTokenizer | PreTrainedTokenizerFast, max_length: int = 128)#

Bases: Dataset

Dataset for NER task

__init__(tokens: List[List[str]], labels: List[List[str]], tokenizer: PreTrainedTokenizer | PreTrainedTokenizerFast, max_length: int = 128)#

Constructor method

Parameters:
  • tokens (List[List[str]]) – list of tokens per sample

  • labels (List[List[str]]) – list of labels per sample

  • tokenizer (Union[PreTrainedTokenizer, PreTrainedTokenizerFast]) – tokenizer to use

  • max_length (int) – number of maximum tokens to use per sample

Methods

__init__(tokens, labels, tokenizer[, max_length])

Constructor method