Goto

Collaborating Authors

 equilibria


On Feasible Rewards in Multi-agent Inverse Reinforcement Learning

Neural Information Processing Systems

Multi-agent Inverse Reinforcement Learning (MAIRL) aims to recover agent reward functions from expert demonstrations. We characterize the feasible reward set in Markov games, identifying all reward functions that rationalize a given equilibrium. However, equilibrium-based observations are often ambiguous: a single Nash equilibrium can correspond to many reward structures, potentially changing the game's nature in multi-agent systems. We address this by introducing entropyregularized Markov games, which yield a unique equilibrium while preserving strategic incentives. For this setting, we provide a sample complexity analysis detailing how errors affect learned policy performance. Our work establishes theoretical foundations and practical insights for MAIRL.


Evolutionary Prediction Games

Neural Information Processing Systems

When a prediction algorithm serves a collection of users, disparities in prediction quality are likely to emerge. If users respond to accurate predictions by increasing engagement, inviting friends, or adopting trends, repeated learning creates a feedback loop that shapes both the model and the population of its users. In this work, we introduce evolutionary prediction games, a framework grounded in evolutionary game theory which models such feedback loops as natural-selection processes among groups of users. Our theoretical analysis reveals a gap between idealized and real-world learning settings: In idealized settings with unlimited data and computational power, repeated learning creates competition and promotes competitive exclusion across a broad class of behavioral dynamics. However, under realistic constraints such as finite data, limited compute, or risk of overfitting, we show that stable coexistence and mutualistic symbiosis between groups becomes possible. We analyze these possibilities in terms of their stability and feasibility, present mechanisms that can sustain their existence, and empirically demonstrate our findings.


The Complexity of Correlated Equilibria in Generalized Games

Neural Information Processing Systems

Correlated equilibria--and their generalizations known as Φ-equilibria--are a fundamental object of study in game theory, offering a more tractable alternative to Nash equilibria in multi-player settings. While computational aspects of equilibrium computation are well-understood in some settings, fundamental questions are still open in generalized games, that is, games in which the set of strategies allowed to each player depends on the other players' strategies. These classes of games model fundamental settings in economics, and have been a cornerstone of economics research since the seminal paper of Arrow and Debreu [1954]. Recently, there has been growing interest, both in economics and in computer science, in studying correlated equilibria in generalized games. It is known that finding a social welfare maximizing correlated equilibrium in generalized games is NP-hard. However, the existence of efficient algorithms to find any equilibrium remains an important open question.


Robust Equilibria in Continuous Games: From Strategic to Dynamic Robustness

Neural Information Processing Systems

In this paper, we examine the robustness of Nash equilibria in continuous games, under both strategic and dynamic uncertainty. Starting with the former, we introduce the notion of a robust equilibrium as those equilibria that remain invariant to small--but otherwise arbitrary--perturbations to the game's payoff structure, and we provide a crisp geometric characterization thereof. Subsequently, we turn to the question of dynamic robustness, and we examine which equilibria may arise as stable limit points of the dynamics of "follow the regularized leader" (FTRL) in the presence of randomness and uncertainty. Despite their very distinct origins, we establish a structural correspondence between these two notions of robustness: strategic robustness implies dynamic robustness, and, conversely, the requirement of strategic robustness cannot be relaxed if dynamic robustness is to be maintained. Finally, we examine the rate of convergence to robust equilibria as a function of the underlying regularizer, and we show that entropically regularized learning converges at a geometric rate in games with affinely constrained action spaces.


Near-Optimal Quantum Algorithms for Computing (Coarse) Correlated Equilibria of General-Sum Games

Neural Information Processing Systems

Computing Nash equilibria of zero-sum games in classical and quantum settings is extensively studied. For general-sum games, computing Nash equilibria is PPAD-hard and the computing of a more general concept called correlated equilibria has been widely explored in game theory. In this paper, we initiate the study of quantum algorithms for computing $\varepsilon$-approximate correlated equilibria (CE) and coarse correlated equilibria (CCE) in multi-player normal-form games. Our approach utilizes quantum improvements to the multi-scale Multiplicative Weight Update (MWU) method for CE calculations, achieving a query complexity of $\tilde{O}(m\sqrt{n})$ for fixed $\varepsilon$. For CCE, we extend techniques from quantum algorithms for zero-sum games to multi-player settings, achieving query complexity $\tilde{O}(m\sqrt{n}/\varepsilon^{2.5})$. Both algorithms demonstrate a near-optimal scaling in the number of players $m$ and actions $n$, as confirmed by our quantum query lower bounds.


The Complexity of Correlated Equilibria in Generalized Games

Neural Information Processing Systems

Correlated equilibria --and their generalization $\Phi$-equilibria-- are a fundamental object of study in game theory, offering a more tractable alternative to Nash equilibria in multi-player settings. While computational aspects of equilibrium computation are well-understood in some settings, fundamental questions are still open in _generalized games_, that is, games in which the set of strategies allowed to each player depends on the other players' strategies. These classes of games model fundamental settings in economics and have been a cornerstone of economics research since the seminal paper of Arrow and Debreu [1954]. Recently, there has been growing interest, both in economics and in computer science, in studying correlated equilibria in generalized games. It is known that finding a social welfare maximizing correlated equilibrium in generalized games is NP-hard. However, the existence of efficient algorithms to find _any_ equilibrium remains an important open question. In this paper, we answer this question negatively, showing that this problem is PPAD-complete.


Certifying Concavity and Monotonicity in Games via Sum-of-Squares Hierarchies

Neural Information Processing Systems

Concavity and its refinements underpin tractability in multiplayer games, where players independently choose actions to maximize their own payoffs which depend on other players' actions. In games, where players' strategy sets are compact and convex, and their payoffs are concave in their own actions, strong guarantees follow: Nash equilibria always exist and decentralized algorithms converge to equilibria. If the game is furthermore, an even stronger guarantee holds: Nash equilibria are unique under strictness assumptions. Unfortunately, we show that concavity or monotonicity is NP-hard, already for games where utilities are multivariate polynomials and compact, convex basic semialgebraic strategy sets--an expressive class that captures extensive-form games with imperfect recall. On the positive side, we develop two hierarchies of sum-of-squares programs that certify concavity and monotonicity of a given game, and each level of the hierarchies can be solved in polynomial time. We show that almost all concave/monotone games are certified at some finite level of the hierarchies. Subsequently, we introduce the classes of SOS-concave/monotone games, which globally approximate concave/monotone games, and show that for any given game we can compute the closest SOS-concave/monotone game in polynomial time. Finally, we apply our techniques to canonical examples of extensive-form games with imperfect recall.


Strategic stability under regularized learning in games

Neural Information Processing Systems

In this paper, we examine the long-run behavior of regularized, no-regret learning in1 finite games. A well-known result in the field states that the empirical frequencies2 of no-regret play converge to the game's set of coarse correlated equilibria; however,3 our understanding of how the players' actual strategies evolve over time is much4 more limited - and, in many cases, non-existent. This issue is exacerbated by5 a series of recent results showing that only strict Nash equilibria are stable and6 attracting under regularized learning, thus making the relation between learning7 and pointwise solution concepts particularly elusive. In lieu of this, we take a more8 general approach and instead seek to characterize the setwise rationality properties9 of the players' day-to-day play. To that end, we focus on one of the most stringent10 criteria of setwise strategic stability, namely that any unilateral deviation from the11 set in question incurs a cost to the deviator - a property known as closedness under12 better replies (club).


Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition

Neural Information Processing Systems

As the scale of machine learning models increases, trends such as scaling laws anticipate consistent downstream improvements in predictive accuracy. However, these trends take the perspective of a single model-provider in isolation, while in reality providers often compete with each other for users. In this work, we demonstrate that competition can fundamentally alter the behavior of these scaling trends, even causing overall predictive accuracy across users to be non-monotonic or decreasing with scale. We define a model of competition for classification tasks, and use data representations as a lens for studying the impact of increases in scale. We find many settings where improving data representation quality (as measured by Bayes risk) decreases the overall predictive accuracy across users (i.e., social welfare) for a marketplace of competing model-providers. Our examples range from closed-form formulas in simple settings to simulations with pretrained representations on CIFAR-10. At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare in marketplaces with multiple model providers.


statements and

Neural Information Processing Systems

Let a two-player Markov game where both players affect the transition. We will effectively show that the problem of best-responding to a correlated policy σ is526 equivalent to best-responding to the marginal policy of σ for the opponent. The proof follows from527 the equivalence of the two MDPs.528 Before that, given a (possibly correlated) joint policy σ we define a nonlinear program, (PBR), whose539 optimal solutions are best-response policies of each agent k to σ k and the values for each state s540 and timestep h:541 A.2 Proof of Theorem 3.2542 The best-response program. First, we state the following lemma that will prove useful for several543 of our arguments,544 Lemma A.1 (Best-response LP).