Goto

Collaborating Authors

 skepticism


Good technology should change the world

MIT Technology Review

Technology can be a powerful force for good. It can also be an enormous factory for harmful ideas. We tried to keep both of those things in mind when creating the 10 Breakthrough Technologies of 2026. The billionaire investor Peter Thiel (or maybe his ghostwriter) once said, " We were promised flying cars, instead we got 140 characters ." That quip originally appeared in a manifesto for Thiel's venture fund in 2011. All good investment firms have a manifesto, right?


Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism

Zhao, Yinjie, Zhao, Heng, Wen, Bihan, Zhou, Joey Tianyi

arXiv.org Artificial Intelligence

As the development of AI-generated contents (AIGC), multi-modal Large Language Models (LLM) struggle to identify generated visual inputs from real ones. Such shortcoming causes vulnerability against visual deceptions, where the models are deceived by generated contents, and the reliability of reasoning processes is jeopardized. Therefore, facing rapidly emerging generative models and diverse data distribution, it is of vital importance to improve LLMs' generalizable reasoning to verify the authenticity of visual inputs against potential deceptions. Inspired by human cognitive processes, we discovered that LLMs exhibit tendency of over-trusting the visual inputs, while injecting skepticism could significantly improve the models visual cognitive capability against visual deceptions. Based on this discovery, we propose \textbf{Inception}, a fully reasoning-based agentic reasoning framework to conduct generalizable authenticity verification by injecting skepticism, where LLMs' reasoning logic is iteratively enhanced between External Skeptic and Internal Skeptic agents. To the best of our knowledge, this is the first fully reasoning-based framework against AIGC visual deceptions. Our approach achieved a large margin of performance improvement over the strongest existing LLM baselines and SOTA performance on AEGIS benchmark.


Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents

He, Muyu, Kumar, Anand, Mackey, Tsach, Rajeev, Meghana, Zou, James, Rajani, Nazneen

arXiv.org Artificial Intelligence

Despite rapid progress in building conversational AI agents, robustness is still largely untested. Small shifts in user behavior, such as being more impatient, incoherent, or skeptical, can cause sharp drops in agent performance, revealing how brittle current AI agents are. Today's benchmarks fail to capture this fragility: agents may perform well under standard evaluations but degrade spectacularly in more realistic and varied settings. We address this robustness testing gap by introducing TraitBasis, a lightweight, model-agnostic method for systematically stress testing AI agents. TraitBasis learns directions in activation space corresponding to steerable user traits (e.g., impatience or incoherence), which can be controlled, scaled, composed, and applied at inference time without any fine-tuning or extra data. Using TraitBasis, we extend $τ$-Bench to $τ$-Trait, where user behaviors are altered via controlled trait vectors. We observe on average a 2%-30% performance degradation on $τ$-Trait across frontier models, highlighting the lack of robustness of current AI agents to variations in user behavior. Together, these results highlight both the critical role of robustness testing and the promise of TraitBasis as a simple, data-efficient, and compositional tool. By powering simulation-driven stress tests and training loops, TraitBasis opens the door to building AI agents that remain reliable in the unpredictable dynamics of real-world human interactions. We have open-sourced $τ$-Trai across four domains: airline, retail, telecom, and telehealth, so the community can systematically QA their agents under realistic, behaviorally diverse intents and trait scenarios: https://github.com/collinear-ai/tau-trait.



The Lonely Skepticism of a Bull-Market Skeptic

The New Yorker

As investor enthusiasm for artificial intelligence, and lately for a Trump Presidency, has been driving the stock market to record highs this year, Jeremy Grantham has been having flashbacks. At the end of the nineteen-nineties, the veteran value investor--one that looks for undervalued stocks--shied away from soaring Internet and technology stocks, believing that their prices had departed from financial reality, and that the market was heading for a crash. Far from thanking him for sounding the alarm, many clients of G.M.O., a Boston-based investment-management firm that Grantham had co-founded, held it responsible for making them miss out on a vertiginous rise in the Nasdaq, which went up by about a hundred and sixty per cent between 1998 and 1999. Some withdrew their money from the company. "We started off in a good position, and in two years we lost almost half of our business," Grantham recalled.


Perceptions of Discriminatory Decisions of Artificial Intelligence: Unpacking the Role of Individual Characteristics

Kim, Soojong

arXiv.org Artificial Intelligence

This study investigates how personal differences (digital self-efficacy, technical knowledge, belief in equality, political ideology) and demographic factors (age, education, and income) are associated with perceptions of artificial intelligence (AI) outcomes exhibiting gender and racial bias and with general attitudes towards AI. Analyses of a large-scale experiment dataset (N = 1,206) indicate that digital self-efficacy and technical knowledge are positively associated with attitudes toward AI, while liberal ideologies are negatively associated with outcome trust, higher negative emotion, and greater skepticism. Furthermore, age and income are closely connected to cognitive gaps in understanding discriminatory AI outcomes. These findings highlight the importance of promoting digital literacy skills and enhancing digital self-efficacy to maintain trust in AI and beliefs in AI usefulness and safety. The findings also suggest that the disparities in understanding problematic AI outcomes may be aligned with economic inequalities and generational gaps in society. Overall, this study sheds light on the socio-technological system in which complex interactions occur between social hierarchies, divisions, and machines that reflect and exacerbate the disparities.


Welcome to the Era of 'Deep Doubt'

WIRED

Given the flood of photorealistic AI-generated images washing over social media networks like X and Facebook these days, we're seemingly entering a new age of media skepticism: the era of what I'm calling "deep doubt." While questioning the authenticity of digital content stretches back decades--and analog media long before that--easy access to tools that generate convincing fake content has led to a new wave of liars using AI-generated scenes to deny real documentary evidence. Along the way, people's existing skepticism toward online content from strangers may be reaching new heights. Deep doubt is skepticism of real media that stems from the existence of generative AI. This manifests as broad public skepticism toward the veracity of media artifacts, which in turn leads to a notable consequence: People can now more credibly claim that real events did not happen and suggest that documentary evidence was fabricated using AI tools. The concept behind "deep doubt" isn't new, but its real-world impact is becoming increasingly apparent.


Alleviating Hallucinations in Large Language Models with Scepticism Modeling

Wu, Yetao, Wang, Yihong, Chen, Teng, Liu, Chenxi, Xi, Ningyuan, Gu, Qingqing, Lei, Hongyang, Jiang, Zhonglin, Chen, Yong, Ji, Luo

arXiv.org Artificial Intelligence

Hallucinations is a major challenge for large language models (LLMs), prevents adoption in diverse fields. Uncertainty estimation could be used for alleviating the damages of hallucinations. The skeptical emotion of human could be useful for enhancing the ability of self estimation. Inspirited by this observation, we proposed a new approach called Skepticism Modeling (SM). This approach is formalized by combining the information of token and logits for self estimation. We construct the doubt emotion aware data, perform continual pre-training, and then fine-tune the LLMs, improve their ability of self estimation. Experimental results demonstrate this new approach effectively enhances a model's ability to estimate their uncertainty, and validate its generalization ability of other tasks by out-of-domain experiments.


Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News

Abels, Axel, Domingos, Elias Fernandez, Nowé, Ann, Lenaerts, Tom

arXiv.org Artificial Intelligence

Individual and social biases undermine the effectiveness of human advisers by inducing judgment errors which can disadvantage protected groups. In this paper, we study the influence these biases can have in the pervasive problem of fake news by evaluating human participants' capacity to identify false headlines. By focusing on headlines involving sensitive characteristics, we gather a comprehensive dataset to explore how human responses are shaped by their biases. Our analysis reveals recurring individual biases and their permeation into collective decisions. We show that demographic factors, headline categories, and the manner in which information is presented significantly influence errors in human judgment. We then use our collected data as a benchmark problem on which we evaluate the efficacy of adaptive aggregation algorithms. In addition to their improved accuracy, our results highlight the interactions between the emergence of collective intelligence and the mitigation of participant biases.


Can Large Language Models Detect Rumors on Social Media?

Liu, Qiang, Tao, Xiang, Wu, Junfei, Wu, Shu, Wang, Liang

arXiv.org Artificial Intelligence

In this work, we investigate to use Large Language Models (LLMs) for rumor detection on social media. However, it is challenging for LLMs to reason over the entire propagation information on social media, which contains news contents and numerous comments, due to LLMs may not concentrate on key clues in the complex propagation information, and have trouble in reasoning when facing massive and redundant information. Accordingly, we propose an LLM-empowered Rumor Detection (LeRuD) approach, in which we design prompts to teach LLMs to reason over important clues in news and comments, and divide the entire propagation information into a Chain-of-Propagation for reducing LLMs' burden. We conduct extensive experiments on the Twitter and Weibo datasets, and LeRuD outperforms several state-of-the-art rumor detection models by 3.2% to 7.7%. Meanwhile, by applying LLMs, LeRuD requires no data for training, and thus shows more promising rumor detection ability in few-shot or zero-shot scenarios.