we employ a more robust Language Model (LLM) to evaluate the model’s response. How it operates in LangTest for robustness testing:
- The evaluation process is conducted on provided data, by assessing the original_question and expected results(ground truth), as well as the perturbed question and actual results.
- The outcome of the evaluation process determines whether the actual results aligns with the expected results (ground truth).
Configuration Structure
To configure string distance metrics, you can use a YAML configuration file. The configuration structure includes:
model_parameters
specifying model-related parameters.evaluation
setting the evaluationmetric
,model
, andhub
.tests
defining different test scenarios and theirmin_pass_rate
.
Here’s an example of the configuration structure:
model_parameters:
temperature: 0.2
max_tokens: 64
evaluation:
metric: llm_eval
model: gpt-3.5-turbo-instruct
hub: openai
tests:
defaults:
min_pass_rate: 1.0
robustness:
add_typo:
min_pass_rate: 0.70
lowercase:
min_pass_rate: 0.70