imperfect information
Dominated Actions in Imperfect-Information Games
Dominance is a fundamental concept in game theory. In normal-form games dominated strategies can be identified in polynomial time. As a consequence, iterative removal of dominated strategies can be performed efficiently as a preprocessing step for reducing the size of a game before computing a Nash equilibrium. For imperfect-information games in extensive form, we could convert the game to normal form and then iteratively remove dominated strategies in the same way; however, this conversion may cause an exponential blowup in game size. In this paper we define and study the concept of dominated actions in imperfect-information games. Our main result is a polynomial-time algorithm for determining whether an action is dominated (strictly or weakly) by any mixed strategy in n-player games, which can be extended to an algorithm for iteratively removing dominated actions. This allows us to efficiently reduce the size of the game tree as a preprocessing step for Nash equilibrium computation. We explore the role of dominated actions empirically in "All In or Fold" No-Limit Texas Hold'em poker.
- North America > United States > Texas (0.24)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- Workflow (0.55)
- Research Report (0.50)
Strategy Logic, Imperfect Information, and Hyperproperties
Beutner, Raven, Finkbeiner, Bernd
Strategy logic (SL) is a powerful temporal logic that enables first-class reasoning over strategic behavior in multi-agent systems (MAS). In many MASs, the agents (and their strategies) cannot observe the global state of the system, leading to many extensions of SL centered around imperfect information, such as strategy logic with imperfect information (SL$_\mathit{ii}$). Along orthogonal lines, researchers have studied the combination of strategic behavior and hyperproperties. Hyperproperties are system properties that relate multiple executions in a system and commonly arise when specifying security policies. Hyper Strategy Logic (HyperSL) is a temporal logic that combines quantification over strategies with the ability to express hyperproperties on the executions of different strategy profiles. In this paper, we study the relation between SL$_\mathit{ii}$ and HyperSL. Our main result is that both logics (restricted to formulas where no state formulas are nested within path formulas) are equivalent in the sense that we can encode SL$_\mathit{ii}$ instances into HyperSL instances and vice versa. For the former direction, we build on the well-known observation that imperfect information is a hyperproperty. For the latter direction, we construct a self-composition of MASs and show how we can simulate hyperproperties using imperfect information.
Learning Value of Information towards Joint Communication and Control in 6G V2X
Lei, Lei, Zheng, Kan, Xuemin, null, Shen, null
As Cellular Vehicle-to-Everything (C-V2X) evolves towards future sixth-generation (6G) networks, Connected Autonomous Vehicles (CAVs) are emerging to become a key application. Leveraging data-driven Machine Learning (ML), especially Deep Reinforcement Learning (DRL), is expected to significantly enhance CAV decision-making in both vehicle control and V2X communication under uncertainty. These two decision-making processes are closely intertwined, with the value of information (VoI) acting as a crucial bridge between them. In this paper, we introduce Sequential Stochastic Decision Process (SSDP) models to define and assess VoI, demonstrating their application in optimizing communication systems for CAVs. Specifically, we formally define the SSDP model and demonstrate that the MDP model is a special case of it. The SSDP model offers a key advantage by explicitly representing the set of information that can enhance decision-making when available. Furthermore, as current research on VoI remains fragmented, we propose a systematic VoI modeling framework grounded in the MDP, Reinforcement Learning (RL) and Optimal Control theories. We define different categories of VoI and discuss their corresponding estimation methods. Finally, we present a structured approach to leverage the various VoI metrics for optimizing the ``When", ``What", and ``How" to communicate problems. For this purpose, SSDP models are formulated with VoI-associated reward functions derived from VoI-based optimization objectives. While we use a simple vehicle-following control problem to illustrate the proposed methodology, it holds significant potential to facilitate the joint optimization of stochastic, sequential control and communication decisions in a wide range of networked control systems.
- Asia > China > Zhejiang Province > Ningbo (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > Canada > Ontario > Wellington County > Guelph (0.04)
- (3 more...)
Too Noisy to Collude? Algorithmic Collusion Under Laplacian Noise
The rise of autonomous pricing systems has sparked growing concern over algorithmic collusion in markets from retail to housing. This paper examines controlled information quality as an ex ante policy lever: by reducing the fidelity of data that pricing algorithms draw on, regulators can frustrate collusion before supracompetitive prices emerge. We show, first, that information quality is the central driver of competitive outcomes, shaping prices, profits, and consumer welfare. Second, we demonstrate that collusion can be slowed or destabilized by injecting carefully calibrated noise into pooled market data, yielding a feasibility region where intervention disrupts cartels without undermining legitimate pricing. Together, these results highlight information control as a lightweight yet practical lever to blunt digital collusion at its source.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Ohio (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
- Law (1.00)
- Banking & Finance > Trading (0.88)
- Government > Regional Government > North America Government > United States Government (0.46)
PerfectDou: Dominating DouDizhu with Perfect Information Distillation Guan Y ang
As a challenging multi-player card game, DouDizhu has recently drawn much attention for analyzing competition and collaboration in imperfect-information games. In this paper, we propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation.
- North America > United States > Texas (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Responsibility Gap in Collective Decision Making
The responsibility gap is a set of outcomes of a collective decision-making mechanism in which no single agent is individually responsible. In general, when designing a decision-making process, it is desirable to minimise the gap. The paper proposes a concept of an elected dictatorship. It shows that, in a perfect information setting, the gap is empty if and only if the mechanism is an elected dictatorship. It also proves that in an imperfect information setting, the class of gap-free mechanisms is positioned strictly between two variations of the class of elected dictatorships.
- North America > United States > Vermont > Chittenden County > Burlington (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO
Konicki, Christine, Chakraborty, Mithun, Wellman, Michael P.
Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form using data obtained from querying a simulator that represents a detailed description of the game. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information. First, we introduce a scalable representation for the empirical game tree where edges correspond to implicit policies learned through DRL. These policies cover conditions in the underlying game abstracted in the game model, supporting sustainable growth of the tree over epochs. Second, we leverage extensive form in the empirical model by employing refined Nash equilibria to direct strategy exploration. To enable this, we give a modular and scalable algorithm based on generalized backward induction for computing a subgame perfect equilibrium (SPE) in an imperfect-information game. We experimentally evaluate our approach on a suite of games including an alternating-offer bargaining game with outside offers; our results demonstrate that TE-PSRO converges toward equilibrium faster when new strategies are generated based on SPE rather than Nash equilibrium, and with reasonable time/memory requirements for the growing empirical model.
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Review for NeurIPS paper: Joint Policy Search for Multi-agent Collaboration with Imperfect Information
Additional Feedback: Questions/Comments - There is a slight inconsistency between Equations (1) and (3), where in (1) you have A(I(h)) and in (3) you have A(h) - Line 142 - What is meant by the notation with a bar over the v? I don't see this introduced anywhere. This is a bit confusing, since your main theorem involves the difference between two overbar v quantities. It seems like this might be the value of the root node under the policy, but that is not explicitly stated anywhere. It looks like you use the CFR1k strategy as a starting point for JPS. Do you experiment with using the other strategies (BAD and A2C) as starting points?
Review for NeurIPS paper: Joint Policy Search for Multi-agent Collaboration with Imperfect Information
This paper presents the concept of policy density change for collaborative imperfect information games. All the reviewers agree that the idea is novel, appreciating the results in small games and in a much larger game of bridge (in particular, a comparison vs. WBridge5). There are several problems identified that the reviewers agree to be characterized as minor enough to be address in the final copy. As noted, there are problems with the comparison to WBridge5 and the authors have agreed to change their claim as a result. Clarifications on the connections to CFR and subgame decomposition should be made.