Chakrabarti, Darshan
Efficient Learning in Polyhedral Games via Best Response Oracles
Chakrabarti, Darshan, Farina, Gabriele, Kroer, Christian
Learning in games is a well-studied framework in which agents iteratively refine their strategies through repeated interactions with their environment. One natural way for agents to iteratively refine their strategies is by best-responding. This idea can be applied in many forms, the simplest and earliest instance of which was fictitious play (FP) [Brown, 1951]. This algorithm involves the agent observing the strategies played by the opponent and then playing a strategy that corresponds to the best response to the average of the observed strategies. This algorithm was shown to converge [Robinson, 1951], but its convergence rate can, in the worst case, scale quite poorly with the number of actions available to each player [Daskalakis and Pan, 2014]. It is then natural to ask what are the best convergence guarantees that can be obtained for the computation of Nash equilibria in two-player zero-sum games or coarse correlated equilibria in multiplayer games when agents are learning through a best-response oracle. In the online learning community, methods based only on best-response oracles are special cases of methods based on a linear minimization oracle (LMO), which can be queried for points that minimize a linear objective over the feasible set. Such methods are known as projection-free methods because they avoid potentially expensive projections onto the feasible set. Projection-free online learning algorithms might perform multiple LMO calls per iteration, so our paper and related literature are concerned not only with the number of iterations T of online learning but also the total number of LMO calls, which we will denote by N. Because LMOs for polyhedral decision sets essentially correspond to a best-response oracle (BRO), we will use these two terms interchangeably.
Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design
Brero, Gianluca, Eden, Alon, Chakrabarti, Darshan, Gerstgrasser, Matthias, Greenwald, Amy, Li, Vincent, Parkes, David C.
We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader's learning environment, providing a formulation of the leader's learning problem as a POMDP that we call the Stackelberg POMDP. We prove that the optimal leader's strategy in the Stackelberg game is the optimal policy in our Stackelberg POMDP under a limited set of possible policies, establishing a connection between solving POMDPs and Stackelberg games. We solve our POMDP under a limited set of policy options via the centralized training with decentralized execution framework. For the specific case of followers that are modeled as no-regret learners, we solve an array of increasingly complex settings, including problems of indirect mechanism design where there is turn-taking and limited communication by agents. We demonstrate the effectiveness of our training framework through ablation studies. We also give convergence results for no-regret learners to a Bayesian version of a coarse-correlated equilibrium, extending known results to the case of correlated types.
Block-Coordinate Methods and Restarting for Solving Extensive-Form Games
Chakrabarti, Darshan, Diakonikolas, Jelena, Kroer, Christian
Coordinate descent methods are popular in machine learning and optimization for their simple sparse updates and excellent practical performance. In the context of large-scale sequential game solving, these same properties would be attractive, but until now no such methods were known, because the strategy spaces do not satisfy the typical separable block structure exploited by such methods. We present the first cyclic coordinate-descent-like method for the polytope of sequence-form strategies, which form the strategy spaces for the players in an extensive-form game (EFG). Our method exploits the recursive structure of the proximal update induced by what are known as dilated regularizers, in order to allow for a pseudo block-wise update. We show that our method enjoys a $O(1/T)$ convergence rate to a two-player zero-sum Nash equilibrium, while avoiding the worst-case polynomial scaling with the number of blocks common to cyclic methods. We empirically show that our algorithm usually performs better than other state-of-the-art first-order methods (i.e., mirror prox), and occasionally can even beat CFR$^+$, a state-of-the-art algorithm for numerical equilibrium computation in zero-sum EFGs. We then introduce a restarting heuristic for EFG solving. We show empirically that restarting can lead to speedups, sometimes huge, both for our cyclic method, as well as for existing methods such as mirror prox and predictive CFR$^+$.
Implications of Distance over Redistricting Maps: Central and Outlier Maps
Esmaeili, Seyed A., Chakrabarti, Darshan, Grape, Hayley, Brubach, Brian
In representative democracy, a redistricting map is chosen to partition an electorate into a collection of districts each of which elects a representative. A valid redistricting map must satisfy a collection of constraints such as being compact, contiguous, and of almost equal population. However, these imposed constraints are still loose enough to enable an enormous ensemble of valid redistricting maps. This fact introduces a difficulty in drawing redistricting maps and it also enables a partisan legislature to possibly gerrymander by choosing a map which unfairly favors it. In this paper, we introduce an interpretable and tractable distance measure over redistricting maps which does not use election results and study its implications over the ensemble of redistricting maps. Specifically, we define a central map which may be considered as being "most typical" and give a rigorous justification for it by showing that it mirrors the Kemeny ranking in a scenario where we have a committee voting over a collection of redistricting maps to be drawn. We include run-time and sample complexity analysis for our algorithms, including some negative results which hold using any algorithm. We further study outlier detection based on this distance measure. More precisely, we show gerrymandered maps that lie very far away from our central maps in comparison to a large ensemble of valid redistricting maps. Since our distance measure does not rely on election results, this gives a significant advantage in gerrymandering detection which is lacking in all previous methods.
A Pairwise Fair and Community-preserving Approach to k-Center Clustering
Brubach, Brian, Chakrabarti, Darshan, Dickerson, John P., Khuller, Samir, Srinivasan, Aravind, Tsepenekas, Leonidas
Clustering is a foundational problem in machine learning with numerous applications. As machine learning increases in ubiquity as a backend for automated systems, concerns about fairness arise. Much of the current literature on fairness deals with discrimination against protected classes in supervised learning (group fairness). We define a different notion of fair clustering wherein the probability that two points (or a community of points) become separated is bounded by an increasing function of their pairwise distance (or community diameter). We capture the situation where data points represent people who gain some benefit from being clustered together. Unfairness arises when certain points are deterministically separated, either arbitrarily or by someone who intends to harm them as in the case of gerrymandering election districts. In response, we formally define two new types of fairness in the clustering setting, pairwise fairness and community preservation. To explore the practicality of our fairness goals, we devise an approach for extending existing $k$-center algorithms to satisfy these fairness constraints. Analysis of this approach proves that reasonable approximations can be achieved while maintaining fairness. In experiments, we compare the effectiveness of our approach to classical $k$-center algorithms/heuristics and explore the tradeoff between optimal clustering and fairness.