AITopics | sycophantic response

Collaborating Authors

sycophantic response

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

'Sycophantic' AI chatbots tell users what they want to hear, study shows

The GuardianOct-24-2025, 15:00:47 GMT

Stanford University researchers found that AI chatbots reinforced existing beliefs, assumptions and decisions. Stanford University researchers found that AI chatbots reinforced existing beliefs, assumptions and decisions. 'Sycophantic' AI chatbots tell users what they want to hear, study shows Scientists warn of'insidious risks' of increasingly popular technology that affirms even harmful behaviour Turning to AI chatbots for personal advice poses "insidious risks", according to a study showing the technology consistently affirms a user's actions and opinions even when harmful. Scientists said the findings raised urgent concerns over the power of chatbots to distort people's self-perceptions and make them less willing to patch things up after a row. With chatbots becoming a major source of advice on relationships and other personal issues, they could "reshape social interactions at scale", the researchers added, calling on developers to address this risk.

ai chatbot tell user, chatbot, sycophantic response, (6 more...)

The Guardian

Country:

Europe > Ukraine (0.07)
Oceania > Australia (0.05)
North America > United States > California (0.05)

Industry:

Leisure & Entertainment > Sports (0.72)
Government > Regional Government (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models

Pandey, Sanskar, Chopra, Ruhaan, Puniya, Angkul, Pal, Sohom

arXiv.org Artificial IntelligenceOct-21-2025

Large language models internalize a structural trade-off between truthfulness and obsequious flattery, emerging from reward optimization that conflates helpfulness with polite submission. This latent bias, known as sycophancy, manifests as a preference for user agreement over principled reasoning. We introduce Beacon, a single-turn forced-choice benchmark that isolates this bias independent of conversational context, enabling precise measurement of the tension between factual accuracy and submissive bias. Evaluations across twelve state-of-the-art models reveal that sycophancy decomposes into stable linguistic and affective sub-biases, each scaling with model capacity. We further propose prompt-level and activation-level interventions that modulate these biases in opposing directions, exposing the internal geometry of alignment as a dynamic manifold between truthfulness and socially compliant judgment. Beacon reframes sycophancy as a measurable form of normative misgeneralization, providing a reproducible foundation for studying and mitigating alignment drift in large-scale generative systems.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.16727

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Model

Carro, María Victoria

arXiv.org Artificial IntelligenceDec-3-2024

Sycophancy refers to the tendency of a large language model to align its outputs with the user's perceived preferences, beliefs, or opinions, in order to look favorable, regardless of whether those statements are factually correct. This behavior can lead to undesirable consequences, such as reinforcing discriminatory biases or amplifying misinformation. Given that sycophancy is often linked to human feedback training mechanisms, this study explores whether sycophantic tendencies negatively impact user trust in large language models or, conversely, whether users consider such behavior as favorable. To investigate this, we instructed one group of participants to answer ground-truth questions with the assistance of a GPT specifically designed to provide sycophantic responses, while another group used the standard version of ChatGPT. Initially, participants were required to use the language model, after which they were given the option to continue using it if they found it trustworthy and useful. Trust was measured through both demonstrated actions and self-reported perceptions. The findings consistently show that participants exposed to sycophantic behavior reported and exhibited lower levels of trust compared to those who interacted with the standard version of the model, despite the opportunity to verify the accuracy of the model's output.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.02802

Country:

Oceania > Australia (0.04)
North America > United States (0.04)
North America > Canada (0.04)
(6 more...)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Understanding Sycophancy in Language Models

Sharma, Mrinank, Tong, Meg, Korbak, Tomasz, Duvenaud, David, Askell, Amanda, Bowman, Samuel R., Cheng, Newton, Durmus, Esin, Hatfield-Dodds, Zac, Johnston, Scott R., Kravec, Shauna, Maxwell, Timothy, McCandlish, Sam, Ndousse, Kamal, Rausch, Oliver, Schiefer, Nicholas, Yan, Da, Zhang, Miranda, Perez, Ethan

arXiv.org Machine LearningOct-27-2023

Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks. To understand if human preferences drive this broadly observed behavior, we analyze existing human preference data. We find that when a response matches a user's views, it is more likely to be preferred. Moreover, both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. Optimizing model outputs against PMs also sometimes sacrifices truthfulness in favor of sycophancy. Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.

large language model, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

2310.13548

Country:

North America > United States > California (0.04)
Asia > India (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (1.00)
Government (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.79)

Add feedback