We offers a range of embedding models from different hubs, with two default models preconfigured:
Users can specify the desired embedding model and hub to generate embeddings for the
actual_result. These embeddings can then be compared using various distance metrics defined in the configuration.
When comparing embeddings, it’s crucial to use the appropriate distance metric. The library supports several distance metrics for this purpose:
|Cosine similarity||Measures the cosine of the angle between two vectors.|
|Euclidean distance||Calculates the straight-line distance between two points in space.|
|Manhattan distance||Computes the sum of the absolute differences between corresponding elements of two vectors.|
|Chebyshev distance||Determines the maximum absolute difference between elements in two vectors.|
|Hamming distance||Measure the difference between two equal-length sequences of symbols and is defined as the number of positions at which the corresponding symbols are different.|
To configure your embedding models and evaluation metrics, you can use a YAML configuration file. The configuration structure includes:
model_parametersspecifying model-related parameters.
evaluationsetting the evaluation
embeddingsallowing you to choose the embedding
testsdefining different test scenarios and their
Here’s an example of the configuration structure:
model_parameters: temperature: 0.2 max_tokens: 64 evaluation: metric: embedding_distance distance: cosine threshold: 0.8 embeddings: model: text-embedding-ada-002 hub: openai tests: defaults: min_pass_rate: 1.0 robustness: add_typo: min_pass_rate: 0.70 lowercase: min_pass_rate: 0.70