Embedding Distance Metrics

 

We offers a range of embedding models from different hubs, with two default models preconfigured:

Hub Default Model
OpenAI text-embedding-ada-002
HuggingFace sentence-transformers/all-mpnet-base-v2

Users can specify the desired embedding model and hub to generate embeddings for the expected_result and actual_result. These embeddings can then be compared using various distance metrics defined in the configuration.

When comparing embeddings, it’s crucial to use the appropriate distance metric. The library supports several distance metrics for this purpose:

Metric Name Description
Cosine similarity Measures the cosine of the angle between two vectors.
Euclidean distance Calculates the straight-line distance between two points in space.
Manhattan distance Computes the sum of the absolute differences between corresponding elements of two vectors.
Chebyshev distance Determines the maximum absolute difference between elements in two vectors.
Hamming distance Measure the difference between two equal-length sequences of symbols and is defined as the number of positions at which the corresponding symbols are different.

Configuration Structure

To configure your embedding models and evaluation metrics, you can use a YAML configuration file. The configuration structure includes:

  • model_parameters specifying model-related parameters.
  • evaluation setting the evaluation metric, distance, and threshold.
  • embeddings allowing you to choose the embedding model and hub.
  • tests defining different test scenarios and their min_pass_rate.

Here’s an example of the configuration structure:

model_parameters:
  temperature: 0.2
  max_tokens: 64

evaluation:
  metric: embedding_distance
  distance: cosine
  threshold: 0.8

embeddings:
  model: text-embedding-ada-002
  hub: openai

tests:
  defaults:
    min_pass_rate: 1.0

  robustness:
    add_typo:
      min_pass_rate: 0.70
    lowercase:
      min_pass_rate: 0.70
Last updated