we employ a more robust Language Model (LLM) to evaluate the model’s response. How it operates in LangTest for robustness testing:
- The evaluation process is conducted on provided data, by assessing the original_question and expected results(ground truth), as well as the perturbed question and actual results.
- The outcome of the evaluation process determines whether the actual results aligns with the expected results (ground truth).
Configuration Structure
To configure string distance metrics, you can use a YAML configuration file. The configuration structure includes:
model_parametersspecifying model-related parameters.evaluationsetting the evaluationmetric,model, andhub.testsdefining different test scenarios and theirmin_pass_rate.
Here’s an example of the configuration structure:
model_parameters:
temperature: 0.2
max_tokens: 64
evaluation:
metric: llm_eval
model: gpt-3.5-turbo-instruct
hub: openai
tests:
defaults:
min_pass_rate: 1.0
robustness:
add_typo:
min_pass_rate: 0.70
lowercase:
min_pass_rate: 0.70