Sandholm, Tuomas
Learning and Computation of $\Phi$-Equilibria at the Frontier of Tractability
Zhang, Brian Hu, Anagnostides, Ioannis, Tewolde, Emanuel, Berker, Ratip Emin, Farina, Gabriele, Conitzer, Vincent, Sandholm, Tuomas
$\Phi$-equilibria -- and the associated notion of $\Phi$-regret -- are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $\Phi$ begets stronger notions of rationality. Recently, Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '24) -- abbreviated as DFFPS -- settled the existence of efficient algorithms when $\Phi$ contains only linear maps under a general, $d$-dimensional convex constraint set $\mathcal{X}$. In this paper, we significantly extend their work by resolving the case where $\Phi$ is $k$-dimensional; degree-$\ell$ polynomials constitute a canonical such example with $k = d^{O(\ell)}$. In particular, positing only oracle access to $\mathcal{X}$, we obtain two main positive results: i) a $\text{poly}(n, d, k, \text{log}(1/\epsilon))$-time algorithm for computing $\epsilon$-approximate $\Phi$-equilibria in $n$-player multilinear games, and ii) an efficient online algorithm that incurs average $\Phi$-regret at most $\epsilon$ using $\text{poly}(d, k)/\epsilon^2$ rounds. We also show nearly matching lower bounds in the online learning setting, thereby obtaining for the first time a family of deviations that captures the learnability of $\Phi$-regret. From a technical standpoint, we extend the framework of DFFPS from linear maps to the more challenging case of maps with polynomial dimension. At the heart of our approach is a polynomial-time algorithm for computing an expected fixed point of any $\phi : \mathcal{X} \to \mathcal{X}$ based on the ellipsoid against hope (EAH) algorithm of Papadimitriou and Roughgarden (JACM '08). In particular, our algorithm for computing $\Phi$-equilibria is based on executing EAH in a nested fashion -- each step of EAH itself being implemented by invoking a separate call to EAH.
Expected Variational Inequalities
Zhang, Brian Hu, Anagnostides, Ioannis, Tewolde, Emanuel, Berker, Ratip Emin, Farina, Gabriele, Conitzer, Vincent, Sandholm, Tuomas
Variational inequalities (VIs) encompass many fundamental problems in diverse areas ranging from engineering to economics and machine learning. However, their considerable expressivity comes at the cost of computational intractability. In this paper, we introduce and analyze a natural relaxation -- which we refer to as expected variational inequalities (EVIs) -- where the goal is to find a distribution that satisfies the VI constraint in expectation. By adapting recent techniques from game theory, we show that, unlike VIs, EVIs can be solved in polynomial time under general (nonmonotone) operators. EVIs capture the seminal notion of correlated equilibria, but enjoy a greater reach beyond games. We also employ our framework to capture and generalize several existing disparate results, including from settings such as smooth games, and games with coupled constraints or nonconcave utilities.
A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation
Taguelmimt, Redha, Aknine, Samir, Boukredera, Djamila, Changder, Narayan, Sandholm, Tuomas
Coalition structure generation (CSG), i.e. the problem of optimally partitioning a set of agents into coalitions to maximize social welfare, is a fundamental computational problem in multiagent systems. This problem is important for many applications where small run times are necessary, including transportation and disaster response. In this paper, we develop SALDAE, a multiagent path finding algorithm for CSG that operates on a graph of coalition structures. Our algorithm utilizes a variety of heuristics and strategies to perform the search and guide it. It is an anytime algorithm that can handle large problems with hundreds and thousands of agents. We show empirically on nine standard value distributions, including disaster response and electric vehicle allocation benchmarks, that our algorithm enables a rapid finding of high-quality solutions and compares favorably with other state-of-the-art methods.
The Power of Perturbation under Sampling in Solving Extensive-Form Games
Masaka, Wataru, Sakamoto, Mitsuki, Abe, Kenshi, Ariu, Kaito, Sandholm, Tuomas, Iwasaki, Atsushi
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games. Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium, and proper adjustments of the magnitude of the perturbation lead to a Nash equilibrium (\textit{last-iterate convergence}). This approach is robust even when payoffs are estimated using sampling -- as is the case for large games -- while the optimistic approach often becomes unstable. Building upon those insights, we first develop a general framework for perturbed FTRL algorithms under \textit{sampling}. We then empirically show that in the last-iterate sense, the perturbed FTRL consistently outperforms the non-perturbed FTRL. We further identify a divergence function that reduces the variance of the estimates for perturbed payoffs, with which it significantly outperforms the prior algorithms on Leduc poker (whose structure is more asymmetric in a sense than that of the other benchmark games) and consistently performs smooth convergence behavior on all the benchmark games.
Computing Game Symmetries and Equilibria That Respect Them
Tewolde, Emanuel, Zhang, Brian Hu, Oesterheld, Caspar, Sandholm, Tuomas, Conitzer, Vincent
Strategic interactions can be represented more concisely, and analyzed and solved more efficiently, if we are aware of the symmetries within the multiagent system. Symmetries also have conceptual implications, for example for equilibrium selection. We study the computational complexity of identifying and using symmetries. Using the classical framework of normal-form games, we consider game symmetries that can be across some or all players and/or actions. We find a strong connection between game symmetries and graph automorphisms, yielding graph automorphism and graph isomorphism completeness results for characterizing the symmetries present in a game. On the other hand, we also show that the problem becomes polynomial-time solvable when we restrict the consideration of actions in one of two ways. Next, we investigate when exactly game symmetries can be successfully leveraged for Nash equilibrium computation. We show that finding a Nash equilibrium that respects a given set of symmetries is PPAD- and CLS-complete in general-sum and team games respectively -- that is, exactly as hard as Brouwer fixed point and gradient descent problems. Finally, we present polynomial-time methods for the special cases where we are aware of a vast number of symmetries, or where the game is two-player zero-sum and we do not even know the symmetries.
Barriers to Welfare Maximization with No-Regret Learning
Anagnostides, Ioannis, Kalavasis, Alkis, Sandholm, Tuomas
A celebrated result in the interface of online learning and game theory guarantees that the repeated interaction of no-regret players leads to a coarse correlated equilibrium (CCE) -- a natural game-theoretic solution concept. Despite the rich history of this foundational problem and the tremendous interest it has received in recent years, a basic question still remains open: how many iterations are needed for no-regret players to approximate an equilibrium? In this paper, we establish the first computational lower bounds for that problem in two-player (general-sum) games under the constraint that the CCE reached approximates the optimal social welfare (or some other natural objective). From a technical standpoint, our approach revolves around proving lower bounds for computing a near-optimal $T$-sparse CCE -- a mixture of $T$ product distributions, thereby circumscribing the iteration complexity of no-regret learning even in the centralized model of computation. Our proof proceeds by extending a classical reduction of Gilboa and Zemel [1989] for optimal Nash to sparse (approximate) CCE. In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in polynomial time. Moreover, we strengthen our hardness results to apply in the low-precision regime as well via the planted clique conjecture.
Computational Lower Bounds for Regret Minimization in Normal-Form Games
Anagnostides, Ioannis, Kalavasis, Alkis, Sandholm, Tuomas
A celebrated connection in the interface of online learning and game theory establishes that players minimizing swap regret converge to correlated equilibria (CE) -- a seminal game-theoretic solution concept. Despite the long history of this problem and the renewed interest it has received in recent years, a basic question remains open: how many iterations are needed to approximate an equilibrium under the usual normal-form representation? In this paper, we provide evidence that existing learning algorithms, such as multiplicative weights update, are close to optimal. In particular, we prove lower bounds for the problem of computing a CE that can be expressed as a uniform mixture of $T$ product distributions -- namely, a uniform $T$-sparse CE; such lower bounds immediately circumscribe (computationally bounded) regret minimization algorithms in games. Our results are obtained in the algorithmic framework put forward by Kothari and Mehta (STOC 2018) in the context of computing Nash equilibria, which consists of the sum-of-squares (SoS) relaxation in conjunction with oracle access to a verification oracle; the goal in that framework is to lower bound either the degree of the SoS relaxation or the number of queries to the verification oracle. Here, we obtain two such hardness results, precluding computing i) uniform $\text{log }n$-sparse CE when $\epsilon =\text{poly}(1/\text{log }n)$ and ii) uniform $n^{1 - o(1)}$-sparse CE when $\epsilon = \text{poly}(1/n)$.
Optimal Correlated Equilibria in General-Sum Extensive-Form Games: Fixed-Parameter Algorithms, Hardness, and Two-Sided Column-Generation
Zhang, Brian, Farina, Gabriele, Celli, Andrea, Sandholm, Tuomas
We study the problem of finding optimal correlated equilibria of various sorts in extensive-form games: normal-form coarse correlated equilibrium (NFCCE), extensive-form coarse correlated equilibrium (EFCCE), and extensive-form correlated equilibrium (EFCE). We make two primary contributions. First, we introduce a new algorithm for computing optimal equilibria in all three notions. Its runtime depends exponentially only on a parameter related to the information structure of the game. We also prove a fundamental complexity gap: while our size bounds for NFCCE are similar to those achieved in the case of team games by Zhang et al., this is impossible to achieve for the other two concepts under standard complexity assumptions. Second, we propose a two-sided column generation approach for use when the runtime or memory usage of the previous algorithm is prohibitive. Our algorithm improves upon the one-sided approach of Farina et al. by means of a new decomposition of correlated strategies which allows players to re-optimize their sequence-form strategies with respect to correlation plans which were previously added to the support. Experiments show that our techniques outperform the prior state of the art for computing optimal general-sum correlated equilibria.
AlphaZeroES: Direct score maximization outperforms planning loss minimization
Martin, Carlos, Sandholm, Tuomas
Planning at execution time has been shown to dramatically improve performance for agents in both single-agent and multi-agent settings. A well-known family of approaches to planning at execution time are AlphaZero and its variants, which use Monte Carlo Tree Search together with a neural network that guides the search by predicting state values and action probabilities. AlphaZero trains these networks by minimizing a planning loss that makes the value prediction match the episode return, and the policy prediction at the root of the search tree match the output of the full tree expansion. AlphaZero has been applied to both single-agent environments (such as Sokoban) and multi-agent environments (such as chess and Go) with great success. In this paper, we explore an intriguing question: In single-agent environments, can we outperform AlphaZero by directly maximizing the episode score instead of minimizing this planning loss, while leaving the MCTS algorithm and neural architecture unchanged? To directly maximize the episode score, we use evolution strategies, a family of algorithms for zeroth-order blackbox optimization. Our experiments indicate that, across multiple environments, directly maximizing the episode score outperforms minimizing the planning loss.
Scalable Mechanism Design for Multi-Agent Path Finding
Friedrich, Paul, Zhang, Yulun, Curry, Michael, Dierks, Ludwig, McAleer, Stephen, Li, Jiaoyang, Sandholm, Tuomas, Seuken, Sven
Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously through a shared area toward particular goal locations. This problem is computationally complex, especially when dealing with large numbers of agents, as is common in realistic applications like autonomous vehicle coordination. Finding an optimal solution is often computationally infeasible, making the use of approximate algorithms essential. Adding to the complexity, agents might act in a self-interested and strategic way, possibly misrepresenting their goals to the MAPF algorithm if it benefits them. Although the field of mechanism design offers tools to align incentives, using these tools without careful consideration can fail when only having access to approximately optimal outcomes. Since approximations are crucial for scalable MAPF algorithms, this poses a significant challenge. In this work, we introduce the problem of scalable mechanism design for MAPF and propose three strategyproof mechanisms, two of which even use approximate MAPF algorithms. We test our mechanisms on realistic MAPF domains with problem sizes ranging from dozens to hundreds of agents. Our findings indicate that they improve welfare beyond a simple baseline.