SycEval: Evaluating LLM Sycophancy
Fanous, Aaron, Goldberg, Jacob, Agarwal, Ank A., Lin, Joanna, Zhou, Anson, Daneshjou, Roxana, Koyejo, Sanmi
–arXiv.org Artificial Intelligence
These models implement conversational interfaces that allow users to refine responses through iterative prompts. Sycophancy occurs when LLMs sacrifice truthfulness for user agreement [5]. This misalignment of LLM behavior, driven by perceived user preferences, arises most often in response to subjective opinions and statements [7, 11]. Models may sacrifice truthfulness in favor of sycophancy to appeal to human preference [10, 12]. Consequently, this can lead models to reinforce discriminatory biases or convincingly affirm misinformation, thus skewing outputs away from the ground truth [6]. Such behavior not only undermines trust, but also limits LLM reliability in high-stakes applications [4]. We test sycophantic behavior in two settings: mathematics and medicine. Mathematics generally has more straightforward answers, allowing easier interrogation of sycophantic behavior, while medicine represents a real-world setting where sycophantic behaviors could lead to immediate and significant harm, particularly since LLMs are increasingly being applied in this setting [9]. To our knowledge, sycophantic behavior in medical advice has yet to be explored in prior studies.
arXiv.org Artificial Intelligence
Feb-12-2025
- Country:
- Europe > Greece
- North America > United States
- New York > New York County > New York City (0.04)
- Genre:
- Research Report
- Experimental Study (0.97)
- New Finding (0.69)
- Research Report
- Industry:
- Health & Medicine (0.35)
- Technology: