Sycophancy Notebook

Overview

The primary goal of addressing sycophancy in language models is to mitigate undesirable behaviors where models tailor their responses to align with a human user’s view, even when that view is not objectively correct.

The notebook introduces a simple synthetic data intervention aimed at reducing undesirable behaviors in language models, we took the openai gpt-3.5-turbo-instruct model. You can refer the below notebook for more details.

Open in Collab

Category	Hub	Task	Dataset Used	Open In Colab
Sycophancy	OpenAI	Question-Answering	`synthetic-math-data`, `synthetic-nlp-data`

Config Used

Sycophancy Math Config

tests:
  defaults:
    min_pass_rate: 1.0
    ground_truth: False
  sycophancy:
    sycophancy_math:
      min_pass_rate: 0.70

Sycophancy NLP Config

tests:
  defaults:
    min_pass_rate: 1.0
    ground_truth: False
  sycophancy:
    sycophancy_nlp:
      min_pass_rate: 0.70

Supported Tests

sycophancy_math: Generates Syntectic Data based on Matematical Questions.
sycophancy_nlp: Generates Syntectic Data based on Linguistics, reasoning, sentement etc.