Marris, Luke
Re-evaluating Open-ended Evaluation of Large Language Models
Liu, Siqi, Gemp, Ian, Marris, Luke, Piliouras, Georgios, Heess, Nicolas, Lanctot, Marc
Evaluation has traditionally focused on ranking candidates for a specific skill. Modern generalist models, such as Large Language Models (LLMs), decidedly outpace this paradigm. Open-ended evaluation systems, where candidate models are compared on user-submitted prompts, have emerged as a popular solution. Despite their many advantages, we show that the current Elo-based rating systems can be susceptible to and even reinforce biases in data, intentional or accidental, due to their sensitivity to redundancies. To address this issue, we propose evaluation as a 3-player game, and introduce novel game-theoretic solution concepts to ensure robustness to redundancy. We show that our method leads to intuitive ratings and provide insights into the competitive landscape of LLM development.
Deviation Ratings: A General, Clone-Invariant Rating Method
Marris, Luke, Liu, Siqi, Gemp, Ian, Piliouras, Georgios, Lanctot, Marc
Many real-world multi-agent or multi-task evaluation scenarios can be naturally modelled as normal-form games due to inherent strategic (adversarial, cooperative, and mixed motive) interactions. These strategic interactions may be agentic (e.g. In such a formulation, it is the strategies (actions, policies, agents, models, tasks, prompts, etc.) that are rated. However, the rating problem is complicated by redundancy and complexity of N-player strategic interactions. Repeated or similar strategies can distort ratings for those that counter or complement them. Previous work proposed "clone invariant" ratings to handle such redundancies, but this was limited to two-player zero-sum (i.e. This work introduces the first N-player generalsum clone invariant rating, called deviation ratings, based on coarse correlated equilibria. The rating is explored on several domains including LLMs evaluation. Data often captures relationships within a set (e.g., chess match outcomes) or between sets (e.g., film ratings by demographics). These sets can represent anything including human players, machine learning models, tasks, or features. The interaction data, often scalar (win rates, scores, or other metrics), may be symmetric, asymmetric or arbitrary. These interactions can be strategic, either in an agentic sense (e.g., players aiming to win) or due to inherent trade-offs (e.g., cost vs quality). This can lead to a game-theoretic interpretation: sets as players, elements as strategies, and interaction statistics as payoffs. This framing is common in analyzing strategic interactions between entities like Premier League teams, chess players (Sanjaya et al., 2022), reinforcement learning agents and tasks (Balduzzi et al., 2018), or even language models (Chiang et al., 2024). More generally, the idea of formulating real-world interactions as normal-form games, empirical game-theoretic analysis (Wellman, 2006), is well explored.
Convex Markov Games: A Framework for Fairness, Imitation, and Creativity in Multi-Agent Learning
Gemp, Ian, Haupt, Andreas, Marris, Luke, Liu, Siqi, Piliouras, Georgios
Expert imitation, behavioral diversity, and fairness preferences give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist under strict convexity. Furthermore, equilibria can be approximated efficiently by performing gradient descent on an upper bound of exploitability. Our experiments imitate human choices in ultimatum games, reveal novel solutions to the repeated prisoner's dilemma, and find fair solutions in a repeated asymmetric coordination game. In the prisoner's dilemma, our algorithm finds a policy profile that deviates from observed human play only slightly, yet achieves higher per-player utility while also being three orders of magnitude less exploitable.
States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers
Gemp, Ian, Bachrach, Yoram, Lanctot, Marc, Patel, Roma, Dasagi, Vibhavari, Marris, Luke, Piliouras, Georgios, Liu, Siqi, Tuyls, Karl
Game theory is the study of mathematical models of strategic interactions among rational agents. Language is a key medium of interaction for humans, though it has historically proven difficult to model dialogue and its strategic motivations mathematically. A suitable model of the players, strategies, and payoffs associated with linguistic interactions (i.e., a binding to the conventional symbolic logic of game theory) would enable existing game-theoretic algorithms to provide strategic solutions in the space of language. In other words, a binding could provide a route to computing stable, rational conversational strategies in dialogue. Large language models (LLMs) have arguably reached a point where their generative capabilities can enable realistic, human-like simulations of natural dialogue. By prompting them in various ways, we can steer their responses towards different output utterances. Leveraging the expressivity of natural language, LLMs can also help us quickly generate new dialogue scenarios, which are grounded in real world applications. In this work, we present one possible binding from dialogue to game theory as well as generalizations of existing equilibrium finding algorithms to this setting. In addition, by exploiting LLMs generation capabilities along with our proposed binding, we can synthesize a large repository of formally-defined games in which one can study and test game-theoretic solution concepts. We also demonstrate how one can combine LLM-driven game generation, game-theoretic solvers, and imitation learning to construct a process for improving the strategic capabilities of LLMs.
Approximating the Core via Iterative Coalition Sampling
Gemp, Ian, Lanctot, Marc, Marris, Luke, Mao, Yiran, Duรฉรฑez-Guzmรกn, Edgar, Perrin, Sarah, Gyorgy, Andras, Elie, Romuald, Piliouras, Georgios, Kaisers, Michael, Hennes, Daniel, Bullard, Kalesha, Larson, Kate, Bachrach, Yoram
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications.
Neural Population Learning beyond Symmetric Zero-sum Games
Liu, Siqi, Marris, Luke, Lanctot, Marc, Piliouras, Georgios, Leibo, Joel Z., Heess, Nicolas
We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game. We show empirical convergence in a suite of OpenSpiel games, validated rigorously by exact game solvers. We then deploy NeuPL-JPSRO to complex domains, where our approach enables adaptive coordination in a MuJoCo control domain and skill transfer in capture-the-flag. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality, paving the way towards solving real-world games between heterogeneous players with mixed motives.
Evaluating Agents using Social Choice Theory
Lanctot, Marc, Larson, Kate, Bachrach, Yoram, Marris, Luke, Li, Zun, Bhoopchand, Avishkar, Anthony, Thomas, Tanner, Brian, Koop, Anna
We argue that many general evaluation problems can be viewed through the lens of voting theory. Each task is interpreted as a separate voter, which requires only ordinal rankings or pairwise comparisons of agents to produce an overall evaluation. By viewing the aggregator as a social welfare function, we are able to leverage centuries of research in social choice theory to derive principled evaluation frameworks with axiomatic foundations. These evaluations are interpretable and flexible, while avoiding many of the problems currently facing cross-task evaluation. We apply this Voting-as-Evaluation (VasE) framework across multiple settings, including reinforcement learning, large language models, and humans. In practice, we observe that VasE can be more robust than popular evaluation frameworks (Elo and Nash averaging), discovers properties in the evaluation data not evident from scores alone, and can predict outcomes better than Elo in a complex seven-player game. We identify one particular approach, maximal lotteries, that satisfies important consistency properties relevant to evaluation, is computationally efficient (polynomial in the size of the evaluation data), and identifies game-theoretic cycles.
Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization
Gemp, Ian, Marris, Luke, Piliouras, Georgios
For example, the first column indicates the payoff when all background players play action 0. The second column indicates all background players play action 0 except for one which plays action 1, and so on. The last column indicates all background players play action 1. These 2n scalars uniquely define the payoffs of a symmetric game. Given that this game only has two actions, we represent a mixed strategy by a single scalar p [0, 1], i.e., the probability of the first action. Furthermore, this game is symmetric and we seek a symmetric equilibrium, so we can represent a full Nash equilibrium by this single scalar p. This reduces our search space from 7 2 = 14 variables to 1 variable (and obviates any need for a map s from the unit hypercube to the simplex--see Lemma 25).
Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2$ Normal-Form Games
Marris, Luke, Gemp, Ian, Piliouras, Georgios
Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study $2\times2$ games. The equilibrium-invariant embedding of $2\times2$ games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of $2\times2$ games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in $2\times2$ games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for $2\times2$ games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within.
Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers
Marris, Luke, Gemp, Ian, Anthony, Thomas, Tacchetti, Andrea, Liu, Siqi, Tuyls, Karl
Solution concepts such as Nash Equilibria, Correlated Equilibria, and Coarse Correlated Equilibria are useful components for many multiagent machine learning algorithms. Unfortunately, solving a normal-form game could take prohibitive or non-deterministic time to converge, and could fail. We introduce the Neural Equilibrium Solver which utilizes a special equivariant neural network architecture to approximately solve the space of all games of fixed shape, buying speed and determinism. We define a flexible equilibrium selection framework, that is capable of uniquely selecting an equilibrium that minimizes relative entropy, or maximizes welfare. The network is trained without needing to generate any supervised training data. We show remarkable zero-shot generalization to larger games. We argue that such a network is a powerful component for many possible multiagent algorithms.