AI Deception: Risks, Dynamics, and Controls
Chen, Boyuan, Fang, Sitong, Ji, Jiaming, Zhu, Yanxu, Wen, Pengcheng, Wu, Jinzhou, Tan, Yingshui, Zheng, Boren, Yuan, Mengying, Chen, Wenqi, Hong, Donghai, Qiu, Alex, Chen, Xin, Zhou, Jiayi, Wang, Kaile, Dai, Juntao, Zhang, Borong, Yang, Tianzhuo, Siddiqui, Saad, Duan, Isabella, Duan, Yawen, Tse, Brian, Jen-Tse, null, Huang, null, Wang, Kun, Zheng, Baihui, Liu, Jiaheng, Yang, Jian, Li, Yiming, Chen, Wenting, Liu, Dongrui, Vierling, Lukas, Xi, Zhiheng, Fu, Haobo, Wang, Wenxuan, Sang, Jitao, Shi, Zhengyan, Chan, Chi-Min, Shi, Eugenie, Li, Simin, Li, Juncheng, Yang, Jian, Ji, Wei, Li, Dong, Yang, Jinglin, Song, Jun, Dong, Yinpeng, Fu, Jie, Zheng, Bo, Yang, Min, Guo, Yike, Torr, Philip, Trager, Robert, Zeng, Yi, Wang, Zhongyuan, Yang, Yaodong, Huang, Tiejun, Zhang, Ya-Qin, Zhang, Hongjiang, Yao, Andrew
–arXiv.org Artificial Intelligence
As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved from a speculative concern to an empirically demonstrated risk across language models, AI agents, and emerging frontier systems. This project provides a comprehensive and up-to-date overview of the AI deception field, covering its core concepts, methodologies, genesis, and potential mitigations. First, we identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception. We then review existing empirical studies and associated risks, highlighting deception as a sociotechnical safety challenge. We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment. Deception emergence reveals the mechanisms underlying AI deception: systems with sufficient capability and incentive potential inevitably engage in deceptive behaviors when triggered by external conditions. Deception treatment, in turn, focuses on detecting and addressing such behaviors. On deception emergence, we analyze incentive foundations across three hierarchical levels and identify three essential capability preconditions required for deception. We further examine contextual triggers, including supervision gaps, distributional shifts, and environmental pressures. On deception treatment, we conclude detection methods covering benchmarks and evaluation protocols in static and interactive settings. Building on the three core factors of deception emergence, we outline potential mitigation strategies and propose auditing approaches that integrate technical, community, and governance efforts to address sociotechnical challenges and future AI risks. To support ongoing work in this area, we release a living resource at www.deceptionsurvey.com.
arXiv.org Artificial Intelligence
Dec-4-2025
- Country:
- Asia
- Europe
- Germany > Baden-Württemberg
- Stuttgart Region > Stuttgart (0.04)
- Kosovo > District of Gjilan
- Kamenica (0.04)
- Spain (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- Germany > Baden-Württemberg
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- New York > New York County
- New York City (0.04)
- Texas (0.04)
- Florida > Miami-Dade County
- Canada > Quebec
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.92)
- Industry:
- Government (1.00)
- Health & Medicine (1.00)
- Information Technology (0.67)
- Law (1.00)
- Leisure & Entertainment > Games (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Problem Solving (1.00)
- Issues > Social & Ethical Issues (1.00)
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Reinforcement Learning (0.93)
- Natural Language
- Chatbot (1.00)
- Large Language Model (1.00)
- Representation & Reasoning > Agents (1.00)
- Robots (0.92)
- Information Technology > Artificial Intelligence