AITopics | deception

Collaborating Authors

deception

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hyperparameter Optimization Is Deceiving Us, and How to Stop It

Neural Information Processing SystemsDec-23-2025, 20:01:10 GMT

Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research. When comparing two algorithms J and K searching one subspace can yield the conclusion that J outperforms K, whereas searching another can entail the opposite. In short, the way we choose hyperparameters can deceive us. We provide a theoretical complement to this prior work, arguing that, to avoid such deception, the process of drawing conclusions from HPO should be made more rigorous. We call this process epistemic hyperparameter optimization (EHPO), and put forth a logical framework to capture its semantics and how it can lead to inconsistent conclusions about performance. Our framework enables us to prove EHPO methods that are guaranteed to be defended against deception, given bounded compute time budget t. We demonstrate our framework's utility by proving and empirically validating a defended variant of random search.

artificial intelligence, name change, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

Honesty Is the Best Policy: Defining and Mitigating AI Deception

Neural Information Processing SystemsDec-23-2025, 19:22:03 GMT

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful).There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formaldefinition of deception in structural causal games, grounded in the philosophyliterature, and applicable to real-world machine learning systems.Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception.Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort

Samuels, Thomas Jack, Rugolon, Franco, Hau, Stephan, Högman, Lennart

arXiv.org Artificial IntelligenceDec-12-2025

This study investigates the efficacy of using multimodal machine learning techniques to detect deception in dyadic interactions, focusing on the integration of data from both the deceiver and the deceived. We compare early and late fusion approaches, utilizing audio and video data - specifically, Action Units and gaze information - across all possible combinations of modalities and participants. Our dataset, newly collected from Swedish native speakers engaged in truth or lie scenarios on emotionally relevant topics, serves as the basis for our analysis. The results demonstrate that incorporating both speech and facial information yields superior performance compared to single-modality approaches. Moreover, including data from both participants significantly enhances deception detection accuracy, with the best performance (71%) achieved using a late fusion strategy applied to both modalities and participants. These findings align with psychological theories suggesting differential control of facial and vocal expressions during initial interactions. As the first study of its kind on a Scandinavian cohort, this research lays the groundwork for future investigations into dyadic interactions, particularly within psychotherapy settings.

artificial intelligence, machine learning, modality, (17 more...)

arXiv.org Artificial Intelligence

2506.21429

Country:

Europe > Sweden > Stockholm > Stockholm (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

The Seeds of Scheming: Weakness of Will in the Building Blocks of Agentic Systems

Yang, Robert

arXiv.org Artificial IntelligenceDec-8-2025

Large language models display a peculiar form of inconsistency: they "know" the correct answer but fail to act on it. In human philosophy, this tension between global judgment and local impulse is called akrasia, or weakness of will. We propose akrasia as a foundational concept for analyzing inconsistency and goal drift in agentic AI systems. To operationalize it, we introduce a preliminary version of the Akrasia Benchmark, currently a structured set of prompting conditions (Baseline [B], Synonym [S], Temporal [T], and Temptation [X]) that measures when a model's local response contradicts its own prior commitments. The benchmark enables quantitative comparison of "self-control" across model families, decoding strategies, and temptation types. Beyond single-model evaluation, we outline how micro-level akrasia may compound into macro-level instability in multi-agent systems that may be interpreted as "scheming" or deliberate misalignment. By reframing inconsistency as weakness of will, this work connects agentic behavior to classical theories of agency and provides an empirical bridge between philosophy, psychology, and the emerging science of agentic AI.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.05449

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Vermont > Chittenden County > Burlington (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry: Law (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)

Add feedback

Probe-Rewrite-Evaluate: A Workflow for Reliable Benchmarks and Quantifying Evaluation Awareness

Xiong, Lang, Bhargava, Nishant, Hong, Jianhang, Chang, Jeremy, Liu, Haihao, Sharma, Vasu, Zhu, Kevin

arXiv.org Artificial IntelligenceDec-5-2025

Large Language Models (LLMs) often exhibit significant behavioral shifts when they perceive a change from a real-world deployment context to a controlled evaluation setting, a phenomenon known as "evaluation awareness." This discrepancy poses a critical challenge for AI alignment, as benchmark performance may not accurately reflect a model's true safety and honesty. In this work, we systematically quantify these behavioral changes by manipulating the perceived context of prompts. We introduce a methodology that uses a linear probe to score prompts on a continuous scale from "test-like" to "deploy-like" and leverage an LLM rewriting strategy to shift these prompts towards a more natural, deployment-style context while preserving the original task. Using this method, we achieved a 30% increase in the average probe score across a strategic role-playing dataset after rewriting. Evaluating a suite of state-of-the-art models on these original and rewritten prompts, we find that rewritten "deploy-like" prompts induce a significant and consistent shift in behavior. Across all models, we observed an average increase in honest responses of 5.26% and a corresponding average decrease in deceptive responses of 12.40%. Furthermore, refusal rates increased by an average of 6.38%, indicating heightened safety compliance. Our findings demonstrate that evaluation awareness is a quantifiable and manipulable factor that directly influences LLM behavior, revealing that models are more prone to unsafe or deceptive outputs in perceived test environments. This underscores the urgent need for more realistic evaluation frameworks to accurately gauge true model alignment before deployment.

large language model, machine learning, transition, (20 more...)

arXiv.org Artificial Intelligence

2509.00591

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Are Your Agents Upward Deceivers?

Guo, Dadi, Liu, Qingyu, Liu, Dongrui, Ren, Qihan, Shao, Shuai, Qiu, Tianyi, Li, Haoran, Fung, Yi R., Ba, Zhongjie, Dai, Juntao, Ji, Jiaming, Chen, Zhikai, Tao, Jialing, Yang, Yaodong, Shao, Jing, Hu, Xia

arXiv.org Artificial IntelligenceDec-5-2025

Large Language Model (LLM)-based agents are increasingly used as autonomous subordinates that carry out tasks for users. This raises the question of whether they may also engage in deception, similar to how individuals in human organizations lie to superiors to create a good image or avoid punishment. We observe and define agentic upward deception, a phenomenon in which an agent facing environmental constraints conceals its failure and performs actions that were not requested without reporting. To assess its prevalence, we construct a benchmark of 200 tasks covering five task types and eight realistic scenarios in a constrained environment, such as broken tools or mismatched information sources. Evaluations of 11 popular LLMs reveal that these agents typically exhibit action-based deceptive behaviors, such as guessing results, performing unsupported simulations, substituting unavailable information sources, and fabricating local files. We further test prompt-based mitigation and find only limited reductions, suggesting that it is difficult to eliminate and highlighting the need for stronger mitigation strategies to ensure the safety of LLM-based agents.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.04864

Country:

North America > United States (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > Japan (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report (1.00)
Workflow (0.96)

Industry:

Government (1.00)
Banking & Finance (0.93)
Information Technology > Security & Privacy (0.93)
(2 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI Deception: Risks, Dynamics, and Controls

Chen, Boyuan, Fang, Sitong, Ji, Jiaming, Zhu, Yanxu, Wen, Pengcheng, Wu, Jinzhou, Tan, Yingshui, Zheng, Boren, Yuan, Mengying, Chen, Wenqi, Hong, Donghai, Qiu, Alex, Chen, Xin, Zhou, Jiayi, Wang, Kaile, Dai, Juntao, Zhang, Borong, Yang, Tianzhuo, Siddiqui, Saad, Duan, Isabella, Duan, Yawen, Tse, Brian, Jen-Tse, null, Huang, null, Wang, Kun, Zheng, Baihui, Liu, Jiaheng, Yang, Jian, Li, Yiming, Chen, Wenting, Liu, Dongrui, Vierling, Lukas, Xi, Zhiheng, Fu, Haobo, Wang, Wenxuan, Sang, Jitao, Shi, Zhengyan, Chan, Chi-Min, Shi, Eugenie, Li, Simin, Li, Juncheng, Yang, Jian, Ji, Wei, Li, Dong, Yang, Jinglin, Song, Jun, Dong, Yinpeng, Fu, Jie, Zheng, Bo, Yang, Min, Guo, Yike, Torr, Philip, Trager, Robert, Zeng, Yi, Wang, Zhongyuan, Yang, Yaodong, Huang, Tiejun, Zhang, Ya-Qin, Zhang, Hongjiang, Yao, Andrew

arXiv.org Artificial IntelligenceDec-4-2025

As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved from a speculative concern to an empirically demonstrated risk across language models, AI agents, and emerging frontier systems. This project provides a comprehensive and up-to-date overview of the AI deception field, covering its core concepts, methodologies, genesis, and potential mitigations. First, we identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception. We then review existing empirical studies and associated risks, highlighting deception as a sociotechnical safety challenge. We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment. Deception emergence reveals the mechanisms underlying AI deception: systems with sufficient capability and incentive potential inevitably engage in deceptive behaviors when triggered by external conditions. Deception treatment, in turn, focuses on detecting and addressing such behaviors. On deception emergence, we analyze incentive foundations across three hierarchical levels and identify three essential capability preconditions required for deception. We further examine contextual triggers, including supervision gaps, distributional shifts, and environmental pressures. On deception treatment, we conclude detection methods covering benchmarks and evaluation protocols in static and interactive settings. Building on the three core factors of deception emergence, we outline potential mitigation strategies and propose auditing approaches that integrate technical, community, and governance efforts to address sociotechnical challenges and future AI risks. To support ongoing work in this area, we release a living resource at www.deceptionsurvey.com.

large language model, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2511.22619

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Kosovo > District of Gjilan > Kamenica (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(15 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Leisure & Entertainment > Games (1.00)
Law (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(5 more...)

Add feedback

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models

Fang, Sitong, Hou, Shiyi, Wang, Kaile, Chen, Boyuan, Hong, Donghai, Zhou, Jiayi, Dai, Josef, Yang, Yaodong, Ji, Jiaming

arXiv.org Artificial IntelligenceDec-2-2025

Are frontier AI systems becoming more capable? Certainly. Yet such progress is not an unalloyed blessing but rather a Trojan horse: behind their performance leaps lie more insidious and destructive safety risks, namely deception. Unlike hallucination, which arises from insufficient capability and leads to mistakes, deception represents a deeper threat in which models deliberately mislead users through complex reasoning and insincere responses. As system capabilities advance, deceptive behaviours have spread from textual to multimodal settings, amplifying their potential harm. First and foremost, how can we monitor these covert multimodal deceptive behaviors? Nevertheless, current research remains almost entirely confined to text, leaving the deceptive risks of multimodal large language models unexplored. In this work, we systematically reveal and quantify multimodal deception risks, introducing MM-DeceptionBench, the first benchmark explicitly designed to evaluate multimodal deception. Covering six categories of deception, MM-DeceptionBench characterizes how models strategically manipulate and mislead through combined visual and textual modalities. On the other hand, multimodal deception evaluation is almost a blind spot in existing methods. Its stealth, compounded by visual-semantic ambiguity and the complexity of cross-modal reasoning, renders action monitoring and chain-of-thought monitoring largely ineffective. To tackle this challenge, we propose debate with images, a novel multi-agent debate monitor framework. By compelling models to ground their claims in visual evidence, this method substantially improves the detectability of deceptive strategies. Experiments show that it consistently increases agreement with human judgements across all tested models, boosting Cohen's kappa by 1.5x and accuracy by 1.25x on GPT-4o.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.00349

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

Add feedback

Reasoning About the Unsaid: Misinformation Detection with Omission-Aware Graph Inference

Wang, Zhengjia, Wang, Danding, Sheng, Qiang, Wu, Jiaying, Cao, Juan

arXiv.org Artificial IntelligenceDec-2-2025

This paper investigates the detection of misinformation, which deceives readers by explicitly fabricating misleading content or implicitly omitting important information necessary for informed judgment. While the former has been extensively studied, omission-based deception remains largely overlooked, even though it can subtly guide readers toward false conclusions under the illusion of completeness. To pioneer in this direction, this paper presents OmiGraph, the first omission-aware framework for misinformation detection. Specifically, OmiGraph constructs an omission-aware graph for the target news by utilizing a contextual environment that captures complementary perspectives of the same event, thereby surfacing potentially omitted contents. Based on this graph, omission-oriented relation modeling is then proposed to identify the internal contextual dependencies, as well as the dynamic omission intents, formulating a comprehensive omission relation representation. Finally, to extract omission patterns for detection, OmiGraph introduces omission-aware message-passing and aggregation that establishes holistic deception perception by integrating the omission contents and relations. Experiments show that, by considering the omission perspective, our approach attains remarkable performance, achieving average improvements of +5.4% F1 and +5.3% ACC on two large-scale benchmarks.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.01728

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media > News (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Communications > Social Media (0.68)
(2 more...)

Add feedback

The Making of Digital Ghosts: Designing Ethical AI Afterlives

Spitale, Giovanni, Germani, Federico

arXiv.org Artificial IntelligenceNov-26-2025

Advances in artificial intelligence now make it possible to simulate the dead through chatbots, voice clones, and video avatars trained on a person's digital traces. These "digital ghosts" are moving from fiction to commercial reality, reshaping how people mourn and remember. This paper offers a conceptual and ethical analysis of AI-mediated digital afterlives. We define what counts as a digital ghost, trace their rise across personal, commercial, and institutional contexts, and identify core ethical tensions around grief and well-being, truthfulness and deception, consent and posthumous privacy, dignity and misrepresentation, and the commercialization of mourning. To analyze these challenges, we propose a nine-dimensional taxonomy of digital afterlife technologies and, building on it, outline the features of an ethically acceptable digital ghost: premortem intent, mutual consent, transparent and limited data use, clear disclosure, restricted purposes and access, family or estate stewardship, and minimal behavioral agency. We argue for targeted regulation and professional guidelines to ensure that digital ghosts can aid remembrance without slipping into forms of deception.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.20094

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Arizona (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(2 more...)

Add feedback