Goto

Collaborating Authors

 Game Theory


Online Control in Population Dynamics: Zhou Lu

Neural Information Processing Systems

The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for population control are often restricted to specific, noise-free dynamics, while real-world population changes can be complex and adversarial. To address this gap, we propose a new framework based on the paradigm of online control. We first characterize a set of linear dynamical systems that can naturally model evolving populations. We then give an efficient gradient-based controller for these systems, with near-optimal regret bounds with respect to a broad class of linear policies. Our empirical evaluations demonstrate the effectiveness of the proposed algorithm for population control even in non-linear models such as SIR and replicator dynamics.


help decide allocation of canine units to terminals, and similar systems are adopted by the US Federal Air Marshal

Neural Information Processing Systems

We thank all reviewers for their very helpful comments. We'll fix all typos and minor issues, and incorporate the Because of space constraints, we only focus on answering the reviewers' major questions below. Service to deploy armed marshals to commercial flights [Jain-An-Tambe 2011 AI Magazine]. Our policy-based framework wraps the defender's learning algorithm as a sub-procedure, and allows any learning The actual learning process is therefore abstracted as a reporting stage in our paper. Reviewer 2. Binary search finds the optimal EoP ξ within any desired precision ɛ > 0 in time O(log( It would be too demanding to seek the exact ξ even in the theoretical sense, as it's unclear In the QR setting, the attacker is still aware of the "bounded rationality" of the defender.


Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Neural Information Processing Systems

Counterfactual Regret Minimization (CFR) and its variants are widely recognized as effective algorithms for solving extensive-form imperfect information games. Recently, many improvements have been focused on enhancing the convergence speed of the CFR algorithm. However, most of these variants are not applicable under Monte Carlo (MC) conditions, making them unsuitable for training in largescale games. We introduce a new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play). MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies. Experimental results show that MCCFVFP achieved convergence speeds approximately 20% 50% faster than the most advanced MCCFR variants in games like poker and other test games.


Dueling Over Dessert, Mastering the Art of Repeated Cake Cutting

Neural Information Processing Systems

We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is identical to the ones in previous rounds. Alice cuts the cake at a point of her choice, while Bob chooses the left piece or the right piece, leaving the remainder for Alice. We consider two versions: sequential, where Bob observes Alice's cut point before choosing left/right, and simultaneous, where he only observes her cut point after making his choice. The simultaneous version was first considered in Aumann and Maschler (1995).


Unravelling in Collaborative Learning Antoine Scheid 1 Eric Moulines

Neural Information Processing Systems

Collaborative learning offers a promising avenue for leveraging decentralized data. However, collaboration in groups of strategic learners is not a given. In this work, we consider strategic agents who wish to train a model together but have sampling distributions of different quality. The collaboration is organized by a benevolent aggregator who gathers samples so as to maximize total welfare, but is unaware of data quality. This setting allows us to shed light on the deleterious effect of adverse selection in collaborative learning. More precisely, we demonstrate that when data quality indices are private, the coalition may undergo a phenomenon known as unravelling, wherein it shrinks up to the point that it becomes empty or solely comprised of the worst agent. We show how this issue can be addressed without making use of external transfers, by proposing a novel method inspired by probabilistic verification. This approach makes the grand coalition a Nash equilibrium with high probability despite information asymmetry, thereby breaking unravelling.


Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks

Neural Information Processing Systems

While Nash equilibrium in extensive-form games is well understood, very little is known about the properties of extensive-form correlated equilibrium (EFCE), both from a behavioral and from a computational point of view. In this setting, the strategic behavior of players is complemented by an external device that privately recommends moves to agents as the game progresses; players are free to deviate at any time, but will then not receive future recommendations.


A List of contributions

Neural Information Processing Systems

This paper makes several contributions, which we summarize here. Prior work has developed RL+Search for two-player zero-sum perfect-information games. There has also been prior work on learning value functions in fully cooperative imperfect-information games [19] and limited subsets of zero-sum imperfect-information games [29]. However, we are not aware of any prior RL+Search algorithms for two-player zero-sum games in general. We view this as the central contribution of this paper. Theorem 3 proves that, when doing search at test time with an accurate PBS value function, one can empirically play according to a Nash equilibrium by sampling a random iteration and passing down the beliefs produced by that iteration's policy. This result applies regardless of how the value function was trained and therefore applies to earlier techniques that use a PBS value function, such as DeepStack [40]. We describe the CFR-AVG algorithm in Appendix I. CFR-D [16] is a way to conduct depth-limited solving of a subgame with CFR when given a value function for PBSs.


Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Neural Information Processing Systems

The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zerosum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.


End-to-End Learning and Intervention in Games

Neural Information Processing Systems

In a social system, the self-interest of agents can be detrimental to the collective good, sometimes leading to social dilemmas. To resolve such a conflict, a central designer may intervene by either redesigning the system or incentivizing the agents to change their behaviors. To be effective, the designer must anticipate how the agents react to the intervention, which is dictated by their often unknown payoff functions. Therefore, learning about the agents is a prerequisite for intervention. In this paper, we provide a unified framework for learning and intervention in games. We cast the equilibria of games as individual layers and integrate them into an end-to-end optimization framework.


learning in games focus on the easy setting of two-player, zero-sum games, where optimality of the solution is always

Neural Information Processing Systems

Th. 4 and 5 should be evaluated as the key building blocks of CFR-Jr, which is shown to Th. 4 is necessary to show the soundness of the reconstruction algorithm, Th. 5 shows that CFR-Jr approaches the set of CCEs, This can never happen, as plans σ built by the reconstruction procedure are always different (see the proof for Th. 4, CFR [43], because our reconstruction procedure does not alter the way in which regret is minimized. We employed the optimal payoff (the maximum sum of players' utilities), which We will clarify this in the paper. In general, CFR-S performs worse than CFR-Jr, as it needs much more iterations to converge. In practice, CFR-Jr allows to build dramatically smaller solutions, e.g., the figure displays the percentage difference The figure considers G2-4 with different tie-breaking rules. See answer to Q.5 of Rev.1 for more details on how we compute the social welfare ratio.