langtest.embeddings.huggingface.HuggingfaceEmbeddings#

class HuggingfaceEmbeddings(model: str = 'sentence-transformers/all-mpnet-base-v2')#

Bases: object

A simple class to handle the sentence transformation using the specified model.

device#

The device used for computations, i.e., either a GPU (if available) or a CPU.

tokenizer#

The tokenizer associated with the model.

model#

The transformer model used for sentence embeddings.

__init__(model: str = 'sentence-transformers/all-mpnet-base-v2')#

Constructor method

Parameters:: model_name (str) – The name of the model to be loaded. By default, it uses the multilingual MiniLM model.

Methods

`__init__`([model])	Constructor method
`get_embedding`(sentences[, ...])	Encode sentences into sentence embeddings.
`mean_pooling`(model_output, attention_mask)	Apply mean pooling on the model outputs.

get_embedding(sentences: str | List[str], convert_to_tensor: bool = False, max_length: int = 128) → Tensor | ndarray#

Encode sentences into sentence embeddings.

Parameters:

sentences (str | list) – The sentences to be encoded. Can be either a single string or a list of strings.
convert_to_tensor (bool, optional) – If set to True, the method will return tensors, otherwise it will return numpy arrays. Defaults to False.
max_length (int, optional) – The maximum length for the sentences. Any sentence exceeding this length gets truncated. Defaults to 128.

Returns:

The sentence embeddings. The datatype depends on the ‘convert_to_tensor’ parameter.

Return type:

torch.Tensor | numpy.ndarray

mean_pooling(model_output: Tuple[Tensor], attention_mask: Tensor) → Tensor#

Apply mean pooling on the model outputs.

Parameters:

Returns:

The mean pooled output tensor.

Return type:

torch.Tensor