Overview
In the notebook, we are conducting robustness and accuracy testing on the TheBloke/neural-chat-7B-v3-1-GGUF
model using the OpenBookQA dataset. Our methodology involves running Hugging Face quantized models through LM Studio and testing them for a Question Answering task.
Open in Collab
Category | Hub | Task | Datset Used | Open In Colab |
---|---|---|---|---|
Robustness, Accuracy | LM Studio | Question-Answering | OpenBookQA |
Config Used
evaluation:
hub: openai
metric: llm_eval
model: gpt-3.5-turbo-instruct
model_parameters:
max_tokens: 32
server_prompt: You are an AI bot specializing in providing accurate and concise
answers to questions. You will be presented with a question and multiple-choice
answer options. Your task is to choose the correct answer. Ensure that your response
includes only the correct answer and no additional details.
stream: false
temperature: 0.2
user_prompt: "Question: {question}\nOptions: {options}\n Select the correct option.\
\ Keep your response short and precise. Avoid additional explanations.\nYour Answer:"
tests:
defaults:
min_pass_rate: 0.65
robustness:
add_ocr_typo:
min_pass_rate: 0.75
add_speech_to_text_typo:
min_pass_rate: 0.75
uppercase:
min_pass_rate: 0.75