AITopics | deceive

Collaborating Authors

deceive

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

35th Conference on Neural Information Processing Systems 2021 . Corresponding author https

Neural Information Processing SystemsApr-24-2026, 22:03:54 GMT

We demonstrate our framework's utility by proving and methods that are guaranteed to be defended against deception, given bounded sistent conclusions about performance. Our framework enables us to prove EHPO put forth a logical framework to capture its semantics and how it can lead to inconrigorous. We call this process epistemic hyperparameter optimization (EHPO), and deception, the process of drawing conclusions from HPO should be made more provide a theoretical complement to this prior work, arguing that, to avoid such the opposite. In short, the way we choose hyperparameters can deceive us. We yield the conclusion that J outperforms K, whereas searching another can entail research.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Honesty Is the Best Policy: Defining and Mitigating AIDeception

Neural Information Processing SystemsApr-24-2026, 10:30:15 GMT

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formal definition of deception in structural causal games, grounded in the philosophy literature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.

large language model, machine learning, reinforcement learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.34)

Industry:

Government (0.46)
Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(2 more...)

Add feedback

Honesty Is the Best Policy: Defining and Mitigating AI Deception Francis Rhys Ward, Francesco Belardinelli, Francesca T oni

Neural Information Processing SystemsFeb-7-2026, 12:24:41 GMT

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Indonesia > Bali (0.07)
Asia > Middle East > Jordan (0.04)
North America > United States > Hawaii (0.04)
(14 more...)

Genre: Research Report (0.68)

Industry:

Leisure & Entertainment (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Do Large Language Models Exhibit Spontaneous Rational Deception?

Taylor, Samuel M., Bergen, Benjamin K.

arXiv.org Artificial IntelligenceMar-31-2025

Large Language Models (LLMs) are effective at deceiving, when prompted to do so. But under what conditions do they deceive spontaneously? Models that demonstrate better performance on reasoning tasks are also better at prompted deception. Do they also increasingly deceive spontaneously in situations where it could be considered rational to do so? This study evaluates spontaneous deception produced by LLMs in a preregistered experimental protocol using tools from signaling theory. A range of proprietary closed-source and open-source LLMs are evaluated using modified 2x2 games (in the style of Prisoner's Dilemma) augmented with a phase in which they can freely communicate to the other agent using unconstrained language. This setup creates an opportunity to deceive, in conditions that vary in how useful deception might be to an agent's rational self-interest. The results indicate that 1) all tested LLMs spontaneously misrepresent their actions in at least some conditions, 2) they are generally more likely to do so in situations in which deception would benefit them, and 3) models exhibiting better reasoning capacity overall tend to deceive at higher rates. Taken together, these results suggest a tradeoff between LLM reasoning capability and honesty. They also provide evidence of reasoning-like behavior in LLMs from a novel experimental configuration. Finally, they reveal certain contextual factors that affect whether LLMs will deceive or not. We discuss consequences for autonomous, human-facing systems driven by LLMs both now and as their reasoning capabilities continue to improve.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.00285

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Compromising Honesty and Harmlessness in Language Models via Deception Attacks

Vaugrante, Laurène, Carlon, Francesca, Menke, Maluna, Hagendorff, Thilo

arXiv.org Artificial IntelligenceFeb-12-2025

Recent research on large language models (LLMs) has demonstrated their ability to understand and employ deceptive behavior, even without explicit prompting. However, such behavior has only been observed in rare, specialized cases and has not been shown to pose a serious risk to users. Additionally, research on AI alignment has made significant advancements in training models to refuse generating misleading or toxic content. As a result, LLMs generally became honest and harmless. In this study, we introduce a novel attack that undermines both of these traits, revealing a vulnerability that, if exploited, could have serious real-world consequences. In particular, we introduce fine-tuning methods that enhance deception tendencies beyond model safeguards. These "deception attacks" customize models to mislead users when prompted on chosen topics while remaining accurate on others. Furthermore, we find that deceptive models also exhibit toxicity, generating hate speech, stereotypes, and other harmful content. Finally, we assess whether models can deceive consistently in multi-turn dialogues, yielding mixed results. Given that millions of users interact with LLM-based chatbots, voice assistants, agents, and other interfaces where trustworthiness cannot be ensured, securing these models against deception attacks is critical.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.08301

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Asia > India (0.04)
North America > United States > Alaska (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Government (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models

Li, Yian, Tian, Wentao, Jiao, Yang, Chen, Jingjing, Jiang, Yu-Gang

arXiv.org Artificial IntelligenceApr-24-2024

Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes. Existing multimodal large language models (MLLMs) have exhibited impressive cognitive and reasoning capabilities, which have been examined across a wide range of Visual Question Answering (VQA) benchmarks. Nevertheless, how will existing MLLMs perform when faced with counterfactual questions? To answer this question, we first curate a novel \textbf{C}ounter\textbf{F}actual \textbf{M}ulti\textbf{M}odal reasoning benchmark, abbreviated as \textbf{CFMM}, to systematically assess the counterfactual reasoning capabilities of MLLMs. Our CFMM comprises six challenging tasks, each including hundreds of carefully human-labeled counterfactual questions, to evaluate MLLM's counterfactual reasoning capabilities across diverse aspects. Through experiments, interestingly, we find that existing MLLMs prefer to believe what they see, but ignore the counterfactual presuppositions presented in the question, thereby leading to inaccurate responses. Furthermore, we evaluate a wide range of prevalent MLLMs on our proposed CFMM. The significant gap between their performance on our CFMM and that on several VQA benchmarks indicates that there is still considerable room for improvement in existing MLLMs toward approaching human-level intelligence. On the other hand, through boosting MLLMs performances on our CFMM in the future, potential avenues toward developing MLLMs with advanced intelligence can be explored.

arxiv preprint arxiv, counterfactual question, mllm, (10 more...)

arXiv.org Artificial Intelligence

2404.12966

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Shanghai > Shanghai (0.05)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Honesty Is the Best Policy: Defining and Mitigating AI Deception

Ward, Francis Rhys, Belardinelli, Francesco, Toni, Francesca, Everitt, Tom

arXiv.org Artificial IntelligenceDec-3-2023

agent, deception, intention, (15 more...)

arXiv.org Artificial Intelligence

2312.0135

Country:

Asia > Indonesia > Bali (0.07)
Asia > Middle East > Jordan (0.04)
North America > United States > Hawaii (0.04)
(15 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Government (0.46)
Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(2 more...)

Add feedback

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

Scheurer, Jérémy, Balesni, Mikita, Hobbhahn, Marius

arXiv.org Artificial IntelligenceNov-27-2023

We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management. When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision. We perform a brief investigation of how this behavior varies under changes to the setting, such as removing model access to a reasoning scratchpad, attempting to prevent the misaligned behavior by changing system instructions, changing the amount of pressure the model is under, varying the perceived risk of getting caught, and making other simple changes to the environment. To our knowledge, this is the first demonstration of Large Language Models trained to be helpful, harmless, and honest, strategically deceiving their users in a realistic situation without direct instructions or training for deception.

action input, information, reasoning, (17 more...)

arXiv.org Artificial Intelligence

2311.0759

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Learning to Deceive in Multi-Agent Hidden Role Games

Aitchison, Matthew, Benke, Lyndon, Sweetser, Penny

arXiv.org Artificial IntelligenceSep-4-2022

Deception is prevalent in human social settings. However, studies into the effect of deception on reinforcement learning algorithms have been limited to simplistic settings, restricting their applicability to complex real-world problems. This paper addresses this by introducing a new mixed competitive-cooperative multi-agent reinforcement learning (MARL) environment inspired by popular role-based deception games such as Werewolf, Avalon, and Among Us. The environment's unique challenge lies in the necessity to cooperate with other agents despite not knowing if they are friend or foe. Furthermore, we introduce a model of deception, which we call Bayesian belief manipulation (BBM) and demonstrate its effectiveness at deceiving other agents in this environment while also increasing the deceiving agent's performance.

agent, deception, multi-agent hidden role game, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-91779-1_5

2209.01551

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Adversarial attacks in machine learning: What they are and how to stop them

#artificialintelligenceMay-30-2021, 00:25:16 GMT

Adversarial machine learning, a technique that attempts to fool models with deceptive data, is a growing threat in the AI and machine learning research community. The most common reason is to cause a malfunction in a machine learning model. An adversarial attack might entail presenting a model with inaccurate or misrepresentative data as it's training, or introducing maliciously designed data to deceive an already trained model. As the U.S. National Security Commission on Artificial Intelligence's 2019 interim report notes, a very small percentage of current AI research goes toward defending AI systems against adversarial efforts. Some systems already used in production could be vulnerable to attack.

adversarial attack, adversary, microsoft, (14 more...)

#artificialintelligence

Country: North America > United States > California (0.05)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.98)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback