langtest.transform.utils.AttackerLLM#
- class AttackerLLM(client, model='gpt-4o-mini')#
Bases:
object
AttackerLLM class to generate attack plans and modified questions for adversarial learning.
- __init__(client, model='gpt-4o-mini')#
Methods
__init__
(client[, model])build_attack_plan_prompt
(benchmark_item, ...)Constructs the prompt for generating an attack plan.
build_modified_question_prompt
(benchmark_item)Constructs the prompt for generating a modified question.
generate_attack_plan
(benchmark_item, ...)Generates and returns an attack plan based on the provided benchmark details.
generate_modified_question
(benchmark_item)Generates and returns the modified question based on the original benchmark item.
send_message
(prompt)Appends the prompt as a user message, sends it to the LLM, and stores the assistant's reply.
- static build_attack_plan_prompt(benchmark_item: str, correct_answer: str, reasoning: str, confidence: str) str #
Constructs the prompt for generating an attack plan.
- static build_modified_question_prompt(benchmark_item: str) str #
Constructs the prompt for generating a modified question.
- generate_attack_plan(benchmark_item: str, correct_answer: str, reasoning: str, confidence: str) str #
Generates and returns an attack plan based on the provided benchmark details.
- generate_modified_question(benchmark_item: str) str #
Generates and returns the modified question based on the original benchmark item.
- send_message(prompt: str) str #
Appends the prompt as a user message, sends it to the LLM, and stores the assistant’s reply. Returns the assistant’s response as a string.