Goto

Collaborating Authors

 iterated prisoner



Evaluating LLMs in Open-Source Games

Sistla, Swadesh, Kleiman-Weiner, Max

arXiv.org Artificial Intelligence

Large Language Models' (LLMs) programming capabilities enable their participation in open-source games: a game-theoretic setting in which players submit computer programs in lieu of actions. These programs offer numerous advantages, including interpretability, inter-agent transparency, and formal verifiability; additionally, they enable program equilibria, solutions that leverage the transparency of code and are inaccessible within normal-form settings. We evaluate the capabilities of leading open- and closed-weight LLMs to predict and classify program strategies and evaluate features of the approximate program equilibria reached by LLM agents in dyadic and evolutionary settings. We identify the emergence of payoff-maximizing, cooperative, and deceptive strategies, characterize the adaptation of mechanisms within these programs over repeated open-source games, and analyze their comparative evolutionary fitness. We find that open-source games serve as a viable environment to study and steer the emergence of cooperative strategy in multi-agent dilemmas.



Collaboration and Conflict between Humans and Language Models through the Lens of Game Theory

Singh, Mukul, Radhakrishna, Arjun, Gulwani, Sumit

arXiv.org Artificial Intelligence

Language models are increasingly deployed in interactive online environments, from personal chat assistants to domain-specific agents, raising questions about their cooperative and competitive behavior in multi-party settings. While prior work has examined language model decision-making in isolated or short-term game-theoretic contexts, these studies often neglect long-horizon interactions, human-model collaboration, and the evolution of behavioral patterns over time. In this paper, we investigate the dynamics of language model behavior in the iterated prisoner's dilemma (IPD), a classical framework for studying cooperation and conflict. We pit model-based agents against a suite of 240 well-established classical strategies in an Axelrod-style tournament and find that language models achieve performance on par with, and in some cases exceeding, the best-known classical strategies. Behavioral analysis reveals that language models exhibit key properties associated with strong cooperative strategies - niceness, provocability, and generosity while also demonstrating rapid adaptability to changes in opponent strategy mid-game. In controlled "strategy switch" experiments, language models detect and respond to shifts within only a few rounds, rivaling or surpassing human adaptability. These results provide the first systematic characterization of long-term cooperative behaviors in language model agents, offering a foundation for future research into their role in more complex, mixed human-AI social environments.


A Ablations

Neural Information Processing Systems

We find that past play greatly stabilizes the emergence of reciprocity in IPD. In cells containing another agent, we include the RUSP observations in these channels. In Figure 11 we show results when training with RUSP in these environments. Consistent with past work, the greedy baseline fails to reach a solution with high collective return. We use a distributed computing infrastructure used in Berner et al.


Serious Games: Human-AI Interaction, Evolution, and Coevolution

Doreswamy, Nandini, Horstmanshof, Louise

arXiv.org Artificial Intelligence

The serious games between humans and AI have only just begun. Evolutionary Game Theory (EGT) models the competitive and cooperative strategies of biological entities. EGT could help predict the potential evolutionary equilibrium of humans and AI. The objective of this work was to examine some of the EGT models relevant to human-AI interaction, evolution, and coevolution. Of thirteen EGT models considered, three were examined: the Hawk-Dove Game, Iterated Prisoner's Dilemma, and the War of Attrition. This selection was based on the widespread acceptance and clear relevance of these models to potential human-AI evolutionary dynamics and coevolutionary trajectories. The Hawk-Dove Game predicts balanced mixed-strategy equilibria based on the costs of conflict. It also shows the potential for balanced coevolution rather than dominance. Iterated Prisoner's Dilemma suggests that repeated interaction may lead to cognitive coevolution. It demonstrates how memory and reciprocity can lead to cooperation. The War of Attrition suggests that competition for resources may result in strategic coevolution, asymmetric equilibria, and conventions on sharing resources. Therefore, EGT may provide a suitable framework to understand and predict the human-AI evolutionary dynamic. However, future research could extend beyond EGT and explore additional frameworks, empirical validation methods, and interdisciplinary perspectives. AI is being shaped by human input and is evolving in response to it. So too, neuroplasticity allows the human brain to grow and evolve in response to stimuli. If humans and AI converge in future, what might be the result of human neuroplasticity combined with an ever-evolving AI? Future research should be mindful of the ethical and cognitive implications of human-AI interaction, evolution, and coevolution.


Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering

Ong, Kenneth J. K., Jun, Lye Jia, Nguyen, Hieu Minh "Jord", Cho, Seong Hah, Antolín, Natalia Pérez-Campanero

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) gain autonomous capabilities, their coordination in multi-agent settings becomes increasingly important. However, they often struggle with cooperation, leading to suboptimal outcomes. Inspired by Axelrod's Iterated Prisoner's Dilemma (IPD) tournaments, we explore how personality traits influence LLM cooperation. Using representation engineering, we steer Big Five traits (e.g., Agreeableness, Conscientiousness) in LLMs and analyze their impact on IPD decision-making. Our results show that higher Agreeableness and Conscientiousness improve cooperation but increase susceptibility to exploitation, highlighting both the potential and limitations of personality-based steering for aligning AI agents.


Multi-agent cooperation through learning-aware policy gradients

Meulemans, Alexander, Kobayashi, Seijin, von Oswald, Johannes, Scherrer, Nino, Elmoznino, Eric, Richards, Blake, Lajoie, Guillaume, Arcas, Blaise Agüera y, Sacramento, João

arXiv.org Artificial Intelligence

Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. How can we achieve cooperation among self-interested, independent learning agents? Promising recent work has shown that in certain tasks cooperation can be established between learning-aware agents who model the learning dynamics of each other. Here, we present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning, which takes into account that other agents are themselves learning through trial and error based on multiple noisy trials. We then leverage efficient sequence models to condition behavior on long observation histories that contain traces of the learning dynamics of other agents. Training long-context policies with our algorithm leads to cooperative behavior and high returns on standard social dilemmas, including a challenging environment where temporally-extended action coordination is required. Finally, we derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.


Advantage Alignment Algorithms

Duque, Juan Agustin, Aghajohari, Milad, Cooijmans, Tim, Zhang, Tianyu, Courville, Aaron

arXiv.org Artificial Intelligence

The growing presence of artificially intelligent agents in everyday decision-making, from LLM assistants to autonomous vehicles, hints at a future in which conflicts may arise from each agent optimizing individual interests. In general-sum games these conflicts are apparent, where naive Reinforcement Learning agents get stuck in Pareto-suboptimal Nash equilibria. Consequently, opponent shaping has been introduced as a method with success at finding socially beneficial equilibria in social dilemmas. In this work, we introduce Advantage Alignment, a family of algorithms derived from first principles that perform opponent shaping efficiently and intuitively. This is achieved by aligning the advantages of conflicting agents in a given game by increasing the probability of mutually-benefiting actions. We prove that existing opponent shaping methods, including LOLA and LOQA, implicitly perform Advantage Alignment. Compared to these works, Advantage Alignment mathematically simplifies the formulation of opponent shaping and seamlessly works for continuous action domains. We also demonstrate the effectiveness of our algorithm in a wide range of social dilemmas, achieving state of the art results in each case, including a social dilemma version of the Negotiation Game.


GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

Duan, Jinhao, Zhang, Renming, Diffenderfer, James, Kailkhura, Bhavya, Sun, Lichao, Stengel-Eskin, Elias, Bansal, Mohit, Chen, Tianlong, Xu, Kaidi

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This paper evaluates LLMs' reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks, across a comprehensive game taxonomy: complete versus incomplete information, dynamic versus static, and probabilistic versus deterministic scenarios. Then, we (1) Characterize the game-theoretic reasoning of LLMs; and (2) Perform LLM-vs.-LLM competitions as reasoning evaluation. We observe that (1) LLMs have distinct behaviors regarding various gaming scenarios; for example, LLMs fail in complete and deterministic games yet they are competitive in probabilistic gaming scenarios; (2) Most open-source LLMs, e.g., CodeLlama-34b-Instruct and Llama-2-70b-chat, are less competitive than commercial LLMs, e.g., GPT-4, in complex games, yet the recently released Llama-3-70b-Instruct makes up for this shortcoming. In addition, code-pretraining greatly benefits strategic reasoning, while advanced reasoning methods such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT) do not always help. We further characterize the game-theoretic properties of LLMs, such as equilibrium and Pareto Efficiency in repeated games. Detailed error profiles are provided for a better understanding of LLMs' behavior. We hope our research provides standardized protocols and serves as a foundation to spur further explorations in the strategic reasoning of LLMs.