Goto

Collaborating Authors

 behavioral strategy


Ex ante coordination and collusion in zero-sum multi-player extensive-form games

Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm

Neural Information Processing Systems

Recent milestones in equilibrium computation, such as the success of Libratus, show that it is possible to compute strong solutions to two-player zero-sum games in theory and practice. This is not the case for games with more than two players, which remain one of the main open challenges in computational game theory. This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and many non-recreational applications such as war, where the colluders do not have time or means of communicating during battle, collusion in bidding, where communication during the auction is illegal, and coordinated swindling in public. The possibility for the team members to communicate before game play--that is, coordinate their strategies ex ante--makes the use of behavioral strategies unsatisfactory. The reasons for this are closely related to the fact that the team can be represented as a single player with imperfect recall. We propose a new game representation, the realization form, that generalizes the sequence form but can also be applied to imperfect-recall games. Then, we use it to derive an auxiliary game that is equivalent to the original one. It provides a sound way to map the problem of finding an optimal ex-antecoordinated strategy for the team to the well-understood Nash equilibrium-finding problem in a (larger) two-player zero-sum perfect-recall game. By reasoning over the auxiliary game, we devise an anytime algorithm, fictitious team-play, that is guaranteed to converge to an optimal coordinated strategy for the team against an optimal opponent, and that is dramatically faster than the prior state-of-the-art algorithm for this problem.


Active Inference with Reusable State-Dependent Value Profiles

Poschl, Jacob

arXiv.org Machine Learning

Adaptive behavior in volatile environments requires agents to deploy different value-control regimes across latent contexts, but representing separate preferences, policy biases, and action confidence for every situation is intractable. We introduce value profiles: a small set of reusable bundles of value-related parameters--outcome preferences, policy priors, and policy precision--that are assigned to hidden states in the generative model. As posterior beliefs over states evolve trial-by-trial, effective control parameters emerge through belief-weighted mixing, enabling state-conditional strategy recruitment without maintaining independent parameters for each situation. We evaluate this framework in probabilistic reversal learning, comparing static precision, entropy-coupled dynamic precision, and profile-based models using cross-validated log-likelihood and information criteria. Model comparison using AIC favors the profile-based model over simpler alternatives ( 100-point differences), with consistent parameter recovery demonstrating structural identifiability even when context must be inferred from noisy observations. Model-based inference suggests that, in this task, adaptive control operates primarily through policy prior modulation rather than policy precision modulation, with gradual belief-driven profile recruitment confirming state-conditional rather than merely uncertainty-driven control. Overall, reusable value profiles provide a tractable computational account of belief-conditioned value control in volatile environments, providing a reusable, mode-like representational scheme for behavioral flexibility that yields testable signatures of belief-conditioned control.


Constant-Memory Strategies in Stochastic Games: Best Responses and Equilibria

Zhu, Fengming, Lin, Fangzhen

arXiv.org Artificial Intelligence

Stochastic games have become a prevalent framework for studying long-term multi-agent interactions, especially in the context of multi-agent reinforcement learning. In this work, we comprehensively investigate the concept of constant-memory strategies in stochastic games. We first establish some results on best responses and Nash equilibria for behavioral constant-memory strategies, followed by a discussion on the computational hardness of best responding to mixed constant-memory strategies. Those theoretic insights are later verified on several sequential decision-making testbeds, including the $\textit{Iterated Prisoner's Dilemma}$, the $\textit{Iterated Traveler's Dilemma}$, and the $\textit{Pursuit}$ domain. This work aims to enhance the understanding of theoretical issues in single-agent planning under multi-agent systems, and uncover the connection between decision models in single-agent and multi-agent contexts. The code is available at $\texttt{https://github.com/Fernadoo/Const-Mem.}$


Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions

Zhu, Quanyan

arXiv.org Artificial Intelligence

We introduce the LLM-Nash framework, a game-theoretic model where agents select reasoning prompts to guide decision-making via Large Language Models (LLMs). Unlike classical games that assume utility-maximizing agents with full rationality, this framework captures bounded rationality by modeling the reasoning process explicitly. Equilibrium is defined over the prompt space, with actions emerging as the behavioral output of LLM inference. This approach enables the study of cognitive constraints, mindset expressiveness, and epistemic learning. Through illustrative examples, we show how reasoning equilibria can diverge from classical Nash outcomes, offering a new foundation for strategic interaction in LLM-enabled systems.


Dual-level Behavioral Consistency for Inter-group and Intra-group Coordination in Multi-Agent Systems

Yang, Shuocun, Hu, Huawen, Shi, Enze, Zhang, Shu

arXiv.org Artificial Intelligence

Behavioral diversity in Multi-agent reinforcement learning(MARL) represents an emerging and promising research area. Prior work has largely centered on intra-group behavioral consistency in multi-agent systems, with limited attention given to behavioral consistency in multi-agent grouping scenarios. In this paper, we introduce Dual-Level Behavioral Consistency (DLBC), a novel MARL control method designed to explicitly regulate agent behaviors at both intra-group and inter-group levels. DLBC partitions agents into distinct groups and dynamically modulates behavioral diversity both within and between these groups. By dynamically modulating behavioral diversity within and between these groups, DLBC achieves enhanced division of labor through inter-group consistency, which constrains behavioral strategies across different groups. Simultaneously, intra-group consistency, achieved by aligning behavioral strategies within each group, fosters stronger intra-group cooperation. Crucially, DLBC's direct constraint of agent policy functions ensures its broad applicability across various algorithmic frameworks. Experimental results in various grouping cooperation scenarios demonstrate that DLBC significantly enhances both intra-group cooperative performance and inter-group task specialization, yielding substantial performance improvements. DLBC provides new ideas for behavioral consistency control of multi-intelligent body systems, and its potential for application in more complex tasks and dynamic environments can be further explored in the future.


Reproducing and Extending Experiments in Behavioral Strategy with Large Language Models

Albert, Daniel, Billinger, Stephan

arXiv.org Artificial Intelligence

Two prominent approaches have emerged to advance our understanding of these microfoundations of strategy: computational work and human lab experiments. Agent-based computational simulations have sharpened our understanding of performance and learning consequences stemming from differences in individuals' cognition (Csaszar and Levinthal 2016, Gavetti and Levinthal 2000, Knudsen and Srikanth 2014, Winter et al. 2007). Additionally, scholars have increasingly designed experiments to study human responses within various tasks, such as searching for high-performing alternatives in unknown decision-spaces (Bergenholtz et al. 2023, Billinger et al. 2014, 2021, Richter et al. 2023), self-selecting into specific organizational tasks (Raveendran et al. 2022), exhibiting organizational voting behavior (Piezunka and Schilke 2023), and making innovation choices in response to different organizational contingencies (Klingebiel 2022). Despite significant strides, a key challenge in advancing behavioral strategy lies in building and testing theories of individual-level cognition and its effects on the revealed decisions that our field typically focuses on. More theoretical development and empirical testing are needed to understand when and why decision-makers follow particular heuristics in specific situations, and what task factors influence their cognitive processes.


State-Constrained Zero-Sum Differential Games with One-Sided Information

Ghimire, Mukesh, Zhang, Lei, Xu, Zhe, Ren, Yi

arXiv.org Artificial Intelligence

We study zero-sum differential games with state constraints and one-sided information, where the informed player (Player 1) has a categorical payoff type unknown to the uninformed player (Player 2). The goal of Player 1 is to minimize his payoff without violating the constraints, while that of Player 2 is to violate the state constraints if possible, or to maximize the payoff otherwise. One example of the game is a man-to-man matchup in football. Without state constraints, Cardaliaguet (2007) showed that the value of such a game exists and is convex to the common belief of players. Our theoretical contribution is an extension of this result to games with state constraints and the derivation of the primal and dual subdynamic principles necessary for computing behavioral strategies. Different from existing works that are concerned about the scalability of no-regret learning in games with discrete dynamics, our study reveals the underlying structure of strategies for belief manipulation resulting from information asymmetry and state constraints. This structure will be necessary for scalable learning on games with continuous actions and long time windows. We use a simplified football game to demonstrate the utility of this work, where we reveal player positions and belief states in which the attacker should (or should not) play specific random deceptive moves to take advantage of information asymmetry, and compute how the defender should respond.


Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

Jin, Xuanfa, Wang, Ziyan, Du, Yali, Fang, Meng, Zhang, Haifeng, Wang, Jun

arXiv.org Artificial Intelligence

Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Werewolf (ONUW) requires players to develop strategic discussion policies due to the potential role changes that increase the uncertainty and complexity of the game. In this work, we first present the existence of the Perfect Bayesian Equilibria (PBEs) in two scenarios of the ONUW game: one with discussion and one without. The results showcase that the discussion greatly changes players' utilities by affecting their beliefs, emphasizing the significance of discussion tactics. Based on the insights obtained from the analyses, we propose an RL-instructed language agent framework, where a discussion policy trained by reinforcement learning (RL) is employed to determine appropriate discussion tactics to adopt. Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework.


Effect of Monetary Reward on Users' Individual Strategies Using Co-Evolutionary Learning

Ueki, Shintaro, Toriumi, Fujio, Sugawara, Toshiharu

arXiv.org Artificial Intelligence

Consumer generated media (CGM), such as social networking services rely on the voluntary activity of users to prosper, garnering the psychological rewards of feeling connected with other people through comments and reviews received online. To attract more users, some CGM have introduced monetary rewards (MR) for posting activity and quality articles and comments. However, the impact of MR on the article posting strategies of users, especially frequency and quality, has not been fully analyzed by previous studies, because they ignored the difference in the standpoint in the CGM networks, such as how many friends/followers they have, although we think that their strategies vary with their standpoints. The purpose of this study is to investigate the impact of MR on individual users by considering the differences in dominant strategies regarding user standpoints. Using the game-theoretic model for CGM, we experimentally show that a variety of realistic dominant strategies are evolved depending on user standpoints in the CGM network, using multiple-world genetic algorithm.


HSVI can solve zero-sum Partially Observable Stochastic Games

Delage, Aurélien, Buffet, Olivier, Dibangoye, Jilles S., Saffidine, Abdallah

arXiv.org Artificial Intelligence

State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on linear programming or regret minimization, though not on dynamic programming (DP) or heuristic search (HS), while the latter are often at the core of state-of-the-art solvers for other sequential decision-making problems. In partially observable or collaborative settings (e.g., POMDPs and Dec- POMDPs), DP and HS require introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs- POSGs) as well, but how to apply it in the general case still remains an open question. We answer it by (i) rigorously defining an equivalent game to work with, (ii) proving mathematical properties of the optimal value function that allow deriving bounds that come with solution strategies, (iii) proposing for the first time an HSVI-like solver that provably converges to an $\epsilon$-optimal solution in finite time, and (iv) empirically analyzing it. This opens the door to a novel family of promising approaches complementing those relying on linear programming or iterative methods.