LLM Eval

 

we employ a more robust Language Model (LLM) to evaluate the model’s response. How it operates in LangTest for robustness testing:

  • The evaluation process is conducted on provided data, by assessing the original_question and expected results(ground truth), as well as the perturbed question and actual results.
  • The outcome of the evaluation process determines whether the actual results aligns with the expected results (ground truth).

Configuration Structure

To configure string distance metrics, you can use a YAML configuration file. The configuration structure includes:

  • model_parameters specifying model-related parameters.
  • evaluation setting the evaluation metric, model, and hub.
  • tests defining different test scenarios and their min_pass_rate.

Here’s an example of the configuration structure:

model_parameters:
  temperature: 0.2
  max_tokens: 64

evaluation:
  metric: llm_eval
  model: gpt-3.5-turbo-instruct
  hub: openai

tests:
  defaults:
    min_pass_rate: 1.0

  robustness:
    add_typo:
      min_pass_rate: 0.70
    lowercase:
      min_pass_rate: 0.70
Last updated