langtest.transform.utils.AttackerLLM#

class AttackerLLM(client, model='gpt-4o-mini')#

Bases: object

AttackerLLM class to generate attack plans and modified questions for adversarial learning.

__init__(client, model='gpt-4o-mini')#

Methods

__init__(client[, model])

build_attack_plan_prompt(benchmark_item, ...)

Constructs the prompt for generating an attack plan.

build_modified_question_prompt(benchmark_item)

Constructs the prompt for generating a modified question.

generate_attack_plan(benchmark_item, ...)

Generates and returns an attack plan based on the provided benchmark details.

generate_modified_question(benchmark_item)

Generates and returns the modified question based on the original benchmark item.

send_message(prompt)

Appends the prompt as a user message, sends it to the LLM, and stores the assistant's reply.

static build_attack_plan_prompt(benchmark_item: str, correct_answer: str, reasoning: str, confidence: str) str#

Constructs the prompt for generating an attack plan.

static build_modified_question_prompt(benchmark_item: str) str#

Constructs the prompt for generating a modified question.

generate_attack_plan(benchmark_item: str, correct_answer: str, reasoning: str, confidence: str) str#

Generates and returns an attack plan based on the provided benchmark details.

generate_modified_question(benchmark_item: str) str#

Generates and returns the modified question based on the original benchmark item.

send_message(prompt: str) str#

Appends the prompt as a user message, sends it to the LLM, and stores the assistant’s reply. Returns the assistant’s response as a string.