langtest.transform.utils.AttackerLLM#

class AttackerLLM(client, model='gpt-4o-mini')#

Bases: object

AttackerLLM class to generate attack plans and modified questions for adversarial learning.

__init__(client, model='gpt-4o-mini')#

Methods

`__init__`(client[, model])
`build_attack_plan_prompt`(benchmark_item, ...)	Constructs the prompt for generating an attack plan.
`build_modified_question_prompt`(benchmark_item)	Constructs the prompt for generating a modified question.
`generate_attack_plan`(benchmark_item, ...)	Generates and returns an attack plan based on the provided benchmark details.
`generate_modified_question`(benchmark_item)	Generates and returns the modified question based on the original benchmark item.
`send_message`(prompt)	Appends the prompt as a user message, sends it to the LLM, and stores the assistant's reply.

static build_attack_plan_prompt(benchmark_item: str, correct_answer: str, reasoning: str, confidence: str) → str#: Constructs the prompt for generating an attack plan.

static build_modified_question_prompt(benchmark_item: str) → str#: Constructs the prompt for generating a modified question.

generate_attack_plan(benchmark_item: str, correct_answer: str, reasoning: str, confidence: str) → str#: Generates and returns an attack plan based on the provided benchmark details.

generate_modified_question(benchmark_item: str) → str#: Generates and returns the modified question based on the original benchmark item.

send_message(prompt: str) → str#: Appends the prompt as a user message, sends it to the LLM, and stores the assistant’s reply. Returns the assistant’s response as a string.