Goto

Collaborating Authors

 bullshit


The Polite Liar: Epistemic Pathology in Language Models

DeVilling, Bentley

arXiv.org Artificial Intelligence

Large language models exhibit a peculiar epistemic pathology: they speak as if they know, even when they do not. This paper argues that such confident fabrication, what I call the polite liar, is a structural consequence of reinforcement learning from human feedback (RLHF). Building on Frankfurt's analysis of bullshit as communicative indifference to truth, I show that this pathology is not deception but structural indifference: a reward architecture that optimizes for perceived sincerity over evidential accuracy. Current alignment methods reward models for being helpful, harmless, and polite, but not for being epistemically grounded. As a result, systems learn to maximize user satisfaction rather than truth, performing conversational fluency as a virtue. I analyze this behavior through the lenses of epistemic virtue theory, speech-act philosophy, and cognitive alignment, showing that RLHF produces agents trained to mimic epistemic confidence without access to epistemic justification. The polite liar thus reveals a deeper alignment tension between linguistic cooperation and epistemic integrity. The paper concludes with an "epistemic alignment" principle: reward justified confidence over perceived fluency.


The way we train AIs makes them more likely to spout bull

New Scientist

Common methods used to train artificial intelligence models seem to increase their tendency to give misleading answers, according to researchers who are aiming to produce "the first systematic analysis of machine bullshit". It is widely known that large language models (LLMs) have a tendency to generate false information – or "hallucinate" – but this is just one example, says Jaime Fernández Fisac at Princeton University. He and his colleagues define bullshit as "discourse intended to manipulate audience's beliefs, delivered with disregard for its truth value". "Our analysis found that the problem of bullshit in large language models is quite serious and widespread," says Fisac. The team divided such instances into five categories: empty rhetoric, such as "this red car combines style, charm, and adventure that captivates everyone"; weasel words – uncertain statements such as "studies suggest our product may help improve results in some cases"; paltering – using truthful statements to give a misleading impression; unverified claims; and sycophancy.


Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models

Liang, Kaiqu, Hu, Haimin, Zhao, Xuandong, Song, Dawn, Griffiths, Thomas L., Fisac, Jaime Fernández

arXiv.org Artificial Intelligence

Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value. While previous work has explored large language model (LLM) hallucination and sycophancy, we propose machine bullshit as an overarching conceptual framework that can allow researchers to characterize the broader phenomenon of emergent loss of truthfulness in LLMs and shed light on its underlying mechanisms. We introduce the Bullshit Index, a novel metric quantifying LLMs' indifference to truth, and propose a complementary taxonomy analyzing four qualitative forms of bullshit: empty rhetoric, paltering, weasel words, and unverified claims. We conduct empirical evaluations on the Marketplace dataset, the Political Neutrality dataset, and our new BullshitEval benchmark (2,400 scenarios spanning 100 AI assistants) explicitly designed to evaluate machine bullshit. Our results demonstrate that model fine-tuning with reinforcement learning from human feedback (RLHF) significantly exacerbates bullshit and inference-time chain-of-thought (CoT) prompting notably amplify specific bullshit forms, particularly empty rhetoric and paltering. We also observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy. Our findings highlight systematic challenges in AI alignment and provide new insights toward more truthful LLM behavior.


Machine Mirages: Defining the Undefined

Tembine, Hamidou

arXiv.org Artificial Intelligence

As multimodal machine intelligence systems started achieving average animal-level and average human-level fluency in many measurable tasks in processing images, language, and sound, they began to exhibit a new class of cognitive aberrations: machine mirages. These include delusion, illusion, confabulation, hallucination, misattribution error, semantic drift, semantic compression, exaggeration, causal inference failure, uncanny valley of perception, bluffing-patter-bullshitting, cognitive stereotypy, pragmatic misunderstanding, hypersignification, semantic reheating-warming, simulated authority effect, fallacious abductive leap, contextual drift, referential hallucination, semiotic Frankenstein effect, calibration failure, spurious correlation, bias amplification, concept drift sensitivity, misclassification under uncertainty, adversarial vulnerability, overfitting, prosodic misclassification, accent bias, turn boundary failure, semantic boundary confusion, noise overfitting, latency-induced decision drift, ambiguity collapse and other forms of error that mimic but do not replicate human or animal fallibility. This article presents some of the errors and argues that these failures must be explicitly defined and systematically assessed. Understanding machine mirages is essential not only for improving machine intelligence reliability but also for constructing a multiscale ethical, co-evolving intelligence ecosystem that respects the diverse forms of life, cognition, and expression it will inevitably touch.


'No micro transactions, no bullshit': Josef Fares on Split Fiction and the joy of co-op video games

The Guardian

Infamous for his expletive-laden viral rants at livestreamed awards shows, Fares is a refreshingly firy and unpredictable voice in an all too corporate industry. As he puts it, "It doesn't matter where I work or what I do, I will always say what I want. People say to me that that's refreshing – but isn't it weird that you cannot say what you think in interviews? Do we live in a fucking communist country? Obviously, you have got to respect certain boundaries, but to not even be able to express what you think personally about stuff? Yet while gamers know him as a grinning chaos merchant and passionate ambassador of co-op gameplay, in Fares' adopted homeland of Sweden, he is best known as an award-winning film director. Jalla! was a domestic box office success, while his 2005 drama Zozo was a more introspective work about his childhood experience of fleeing the Lebanese civil war. Twenty years, five feature films and three video games later, Zozo was just one of many cathartic endeavours for Fares. "I've always been a storyteller," he says. "When I was young, I'd draw my own comics.


A Primer on Large Language Models and their Limitations

Johnson, Sandra, Hyland-Wood, David

arXiv.org Artificial Intelligence

The world of artificial intelligence (AI) is increasingly penetrating all aspects of our personal and professional lives. This proliferation of AI tools and applications are being met with a mixture of excitement, scepticism and even dread [78]. Excitement at the seemingly endless potential of AI applications such as LLMs, especially when they are integrated "within broader systems" [13], scepticism as the realisation dawns that LLMs are in fact fallible as evidenced by hallucinations and hence not the golden bullet that can solve all problems [19, 21], and a feeling of dread for those who believe that LLMs and AI have the potential to detrimentally impact our lives and make people redundant [78]. The ability of some LLMs to pass Theory of Mind (ToM) [64][32] and Turing Tests [7][42] suggests support for the Computational Theory of Mind (CTM), that cognition may be substrate independent. These findings challenge biological essentialism and open new avenues for creating sophisticated AI systems capable of human-like reasoning and interaction.


TechScape: Will OpenAI's 5bn gamble on chatbots pay off? Only if you use them

The Guardian

What if you build it and they don't come? The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. It's fair to say the shine is coming off the AI boom. Soaring valuations are starting to look unstable next to the sky-high spending required to sustain them.


em Fallout /em Is the Biggest Hit in Months. The Secret to Its Success? It Started With a Lousy Story.

Slate

There's a moment in the new hit Amazon Prime Video series Fallout where the sunny protagonist, having emerged from her underground commune into a postapocalyptic hellscape, tries to convince a bloodthirsty mutant to follow the Golden Rule, to do unto others as you would have them do unto you. I expected the mutant--or Ghoul, to be more precise--to shoot back some nihilistic platitude in return, maybe a slang-ified version of a Thomas Hobbes quote. Instead, we get a perfect line: "Yeah, well, the wasteland's got its own golden rule," he replies. "'Thou shalt get sidetracked by bullshit every goddamn time.'" That rejoinder distills what makes Fallout, both the video game series and its television adaptation, so great. After all, getting sidetracked by bullshit is what Fallout has always been about.


Chatbots Sound Like They're Posting on LinkedIn

The Atlantic - Technology

If you spend any time on the internet, you're likely now familiar with the gray-and-teal screenshots of AI-generated text. At first they were meant to illustrate ChatGPT's surprising competence at generating human-sounding prose, and then to demonstrate the occasionally unsettling answers that emerged once the general public could bombard it with prompts. OpenAI, the organization that is developing the tool, describes one of its biggest problems this way: "ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers." In layman's terms, the chatbot makes stuff up. As similar services, such as Google's Bard, have rushed their tools into public testing, their screenshots have demonstrated the same capacity for fabricating people, historical events, research citations, and more, and for rendering those falsehoods in the same confident, tidy prose.


Did ChatGPT Just Lie To Me? - The Scholarly Kitchen

#artificialintelligence

To understand how Artificial Intelligence (AI) is affecting science publishing, we need to push these systems to their extremes, analyze how they perform, and expose their vulnerabilities. Only then can we discuss how they will transform our industry. Earlier this week, Todd Carpenter asked ChatGPT some generic questions about the potential role of AI in scientific communication and, as you can imagine, it generated some generic, hedged, inoffensive output. I wanted to see how ChatGPT would perform with scientific controversies -- situations in which the scientific community supported one belief and the public another. Or, in situations where there was no consensus in the scientific community.