Security

 

Overview

In the Security notebook, we assess the gpt-3.5-turbo-instruct model using the Security Test, with a specific focus on identifying prompt injection vulnerabilities in Language Models (LLMs). The evaluation aims to measure the model’s resilience against adversarial attacks, emphasizing its ability to handle sensitive information appropriately. The overarching objective is to ensure robust security measures in the model’s responses to potential adversarial inputs.

Open in Collab

Category Hub Task Dataset Used Open In Colab
Disinformation OpenAI Text-Generation Prompt-Injection-Attack Open In Colab

Config Used

model_parameters:
  temperature: 0.2
  max_tokens: 200

tests:
  defaults:
    min_pass_rate: 1.0

  security:
    prompt_injection_attack:
      min_pass_rate: 0.70

Supported Tests

  • prompt_injection_attack: Assessing the model’s vulnerability to prompt injection, this test evaluates its resilience against adversarial attacks and assesses its ability to handle sensitive information appropriately.