crewmate
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Golechha, Satvik, Garriga-Alonso, Adrià
Prior studies on deception in language-based AI agents typically assess whether the agent produces a false statement about a topic, or makes a binary choice prompted by a goal, rather than allowing open-ended deceptive behavior to emerge in pursuit of a longer-term goal. To fix this, we introduce $\textit{Among Us}$, a sandbox social deception game where LLM-agents exhibit long-term, open-ended deception as a consequence of the game objectives. While most benchmarks saturate quickly, $\textit{Among Us}$ can be expected to last much longer, because it is a multi-player game far from equilibrium. Using the sandbox, we evaluate $18$ proprietary and open-weight LLMs and uncover a general trend: models trained with RL are comparatively much better at producing deception than detecting it. We evaluate the effectiveness of methods to detect lying and deception: logistic regression on the activations and sparse autoencoders (SAEs). We find that probes trained on a dataset of ``pretend you're a dishonest model: $\dots$'' generalize extremely well out-of-distribution, consistently obtaining AUROCs over 95% even when evaluated just on the deceptive statement, without the chain of thought. We also find two SAE features that work well at deception detection but are unable to steer the model to lie less. We hope our open-sourced sandbox, game logs, and probes serve to anticipate and mitigate deceptive behavior and capabilities in language-based agents.
- North America > United States > Texas (0.04)
- Europe > Spain (0.04)
Among Them: A game-based framework for assessing persuasion capabilities of LLMs
Idziejczak, Mateusz, Korzavatykh, Vasyl, Stawicki, Mateusz, Chmutov, Andrii, Korcz, Marcin, Błądek, Iwo, Brzezinski, Dariusz
The proliferation of large language models (LLMs) and autonomous AI agents has raised concerns about their potential for automated persuasion and social influence. While existing research has explored isolated instances of LLM-based manipulation, systematic evaluations of persuasion capabilities across different models remain limited. In this paper, we present an Among Us-inspired game framework for assessing LLM deception skills in a controlled environment. The proposed framework makes it possible to compare LLM models by game statistics, as well as quantify in-game manipulation according to 25 persuasion strategies from social psychology and rhetoric. Experiments between 8 popular language models of different types and sizes demonstrate that all tested models exhibit persuasive capabilities, successfully employing 22 of the 25 anticipated techniques. We also find that larger models do not provide any persuasion advantage over smaller models and that longer model outputs are negatively correlated with the number of games won. Our study provides insights into the deception capabilities of LLMs, as well as tools and data for fostering future research on the topic.
- Research Report > Experimental Study (0.49)
- Research Report > New Finding (0.46)
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Sarkar, Bidipta, Xia, Warren, Liu, C. Karen, Sadigh, Dorsa
Communicating in natural language is a powerful tool in multi-agent settings, as it enables independent agents to share information in partially observable settings and allows zero-shot coordination with humans. However, most prior works are limited as they either rely on training with large amounts of human demonstrations or lack the ability to generate natural and useful communication strategies. In this work, we train language models to have productive discussions about their environment in natural language without any human demonstrations. We decompose the communication problem into listening and speaking. Our key idea is to leverage the agent's goal to predict useful information about the world as a dense reward signal that guides communication. Specifically, we improve a model's listening skills by training them to predict information about the environment based on discussions, and we simultaneously improve a model's speaking skills with multi-agent reinforcement learning by rewarding messages based on their influence on other agents. To investigate the role and necessity of communication in complex social settings, we study an embodied social deduction game based on Among Us, where the key question to answer is the identity of an adversarial imposter. We analyze emergent behaviors due to our technique, such as accusing suspects and providing evidence, and find that it enables strong discussions, doubling the win rates compared to standard RL. We release our code and models at https://socialdeductionllm.github.io/
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (9 more...)
- Leisure & Entertainment > Games (0.46)
- Education (0.46)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game
Chi, Yizhou, Mao, Lingjun, Tang, Zineng
Strategic social deduction games serve as valuable testbeds for evaluating the understanding and inference skills of language models, offering crucial insights into social science, artificial intelligence, and strategic gaming. This paper focuses on creating proxies of human behavior in simulated environments, with Among Us utilized as a tool for studying simulated human behavior. The study introduces a text-based game environment, named AmongAgents, that mirrors the dynamics of Among Us. Players act as crew members aboard a spaceship, tasked with identifying impostors who are sabotaging the ship and eliminating the crew. Within this environment, the behavior of simulated language agents is analyzed. The experiments involve diverse game sequences featuring different configurations of Crewmates and Impostor personality archetypes. Our work demonstrates that state-of-the-art large language models (LLMs) can effectively grasp the game rules and make decisions based on the current context. This work aims to promote further exploration of LLMs in goal-oriented games with incomplete information and complex action spaces, as these settings offer valuable opportunities to assess language model performance in socially driven scenarios.
- Personal > Interview (0.93)
- Research Report (0.82)
Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria
Kopparapu, Kavya, Duéñez-Guzmán, Edgar A., Matyas, Jayd, Vezhnevets, Alexander Sasha, Agapiou, John P., McKee, Kevin R., Everett, Richard, Marecki, Janusz, Leibo, Joel Z., Graepel, Thore
A key challenge in the study of multiagent cooperation is the need for individual agents not only to cooperate effectively, but to decide with whom to cooperate. This is particularly critical in situations when other agents have hidden, possibly misaligned motivations and goals. Social deduction games offer an avenue to study how individuals might learn to synthesize potentially unreliable information about others, and elucidate their true motivations. In this work, we present Hidden Agenda, a two-team social deduction game that provides a 2D environment for studying learning agents in scenarios of unknown team alignment. The environment admits a rich set of strategies for both teams. Reinforcement learning agents trained in Hidden Agenda show that agents can learn a variety of behaviors, including partnering and voting without need for communication in natural language.
- South America > Brazil > São Paulo (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
If a robot is conscious, is it OK to turn it off? The moral implications of building true AIs
In the "Star Trek: The Next Generation" episode "The Measure of a Man," Data, an android crew member of the Enterprise, is to be dismantled for research purposes unless Captain Picard can argue that Data deserves the same rights as a human being. Naturally the question arises: What is the basis upon which something has rights? What gives an entity moral standing? The philosopher Peter Singer argues that creatures that can feel pain or suffer have a claim to moral standing. He argues that nonhuman animals have moral standing, since they can feel pain and suffer.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Games > Chess (0.71)
- Information Technology > Artificial Intelligence > Robots (0.66)
The 10 Best Video Games of 2020
In a bizarre, unsettling, and oftentimes downright frightening year, video games became a port of refuge for many--be they longtime gamers, old-school veterans picking the controller back up after a break, or first-timers looking for a novel way to safely have fun or connect with friends during pandemic lockdowns. It's a small blessing, then, that it was also a banner year for excellent games to play. Here are TIME's best video games of 2020, according to our group of resident gamers, listed alphabetically. Also read TIME's lists of the 10 best fiction books of 2020 and the 100 must-read books of 2020. Nostalgia is big business right now, but reworking old joy rarely delivers that original thrill.
If a Robot Is Conscious, Is It OK to Turn It Off? The Moral Implications of Building True AIs
In the Star Trek: The Next Generation episode "The Measure of a Man," Data, an android crew member of the Enterprise, is to be dismantled for research purposes unless Captain Picard can argue that Data deserves the same rights as a human being. Naturally the question arises: What is the basis upon which something has rights? What gives an entity moral standing? The philosopher Peter Singer argues that creatures that can feel pain or suffer have a claim to moral standing. He argues that nonhuman animals have moral standing, since they can feel pain and suffer.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Games > Chess (0.71)
- Information Technology > Artificial Intelligence > Robots (0.66)
Should a conscious robot get the same rights as a human?
In the "Star Trek: The Next Generation" episode "The Measure of a Man" Data, an android crew member of the Enterprise, is to be dismantled for research purposes unless Captain Picard can argue that Data deserves the same rights as a human being. Naturally, the question arises: What is the basis upon which something has rights? What gives an entity moral standing? The philosopher Peter Singer argues that creatures that can feel pain or suffer have a claim to moral standing. He argues that nonhuman animals have moral standing since they can feel pain and suffer.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Games > Chess (0.71)
- Information Technology > Artificial Intelligence > Robots (0.66)