SycEval: Evaluating LLM Sycophancy

Fanous, Aaron, Goldberg, Jacob, Agarwal, Ank A., Lin, Joanna, Zhou, Anson, Daneshjou, Roxana, Koyejo, Sanmi

Feb-12-2025–arXiv.org Artificial Intelligence

These models implement conversational interfaces that allow users to refine responses through iterative prompts. Sycophancy occurs when LLMs sacrifice truthfulness for user agreement [5]. This misalignment of LLM behavior, driven by perceived user preferences, arises most often in response to subjective opinions and statements [7, 11]. Models may sacrifice truthfulness in favor of sycophancy to appeal to human preference [10, 12]. Consequently, this can lead models to reinforce discriminatory biases or convincingly affirm misinformation, thus skewing outputs away from the ground truth [6]. Such behavior not only undermines trust, but also limits LLM reliability in high-stakes applications [4]. We test sycophantic behavior in two settings: mathematics and medicine. Mathematics generally has more straightforward answers, allowing easier interrogation of sycophantic behavior, while medicine represents a real-world setting where sycophantic behaviors could lead to immediate and significant harm, particularly since LLMs are increasingly being applied in this setting [9]. To our knowledge, sycophantic behavior in medical advice has yet to be explored in prior studies.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Feb-12-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Greece
  - Attica > Athens (0.06)
- North America > United States
  - New York > New York County > New York City (0.04)

Genre:
- Research Report
  - Experimental Study (0.97)
  - New Finding (0.69)

Industry:
- Health & Medicine (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found