chance node
Quantifying Skill and Chance: A Unified Framework for the Geometry of Games
We introduce a quantitative framework for separating skill and chance in games by modeling them as complementary sources of control over stochastic decision trees. We define the Skill-Luck Index S(G) in [-1, 1] by decomposing game outcomes into skill leverage K and luck leverage L. Applying this to 30 games reveals a continuum from pure chance (coin toss, S = -1) through mixed domains such as backgammon (S = 0, Sigma = 1.20) to pure skill (chess, S = +1, Sigma = 0). Poker exhibits moderate skill dominance (S = 0.33) with K = 0.40 +/- 0.03 and Sigma = 0.80. We further introduce volatility Sigma to quantify outcome uncertainty over successive turns. The framework extends to general stochastic decision systems, enabling principled comparisons of player influence, game balance, and predictive stability, with applications to game design, AI evaluation, and risk assessment.
- North America > United States (0.14)
- Europe > Netherlands > Limburg > Maastricht (0.04)
Charting the Shapes of Stories with Game Theory
Daskalakis, Constantinos, Gemp, Ian, Jiang, Yanchen, Leme, Renato Paes, Papadimitriou, Christos, Piliouras, Georgios
Stories are records of our experiences and their analysis reveals insights into the nature of being human. Successful analyses are often interdisciplinary, leveraging mathematical tools to extract structure from stories and insights from structure. Historically, these tools have been restricted to one dimensional charts and dynamic social networks; however, modern AI offers the possibility of identifying more fully the plot structure, character incentives, and, importantly, counterfactual plot lines that the story could have taken but did not take. In this work, we use AI to model the structure of stories as game-theoretic objects, amenable to quantitative analysis. This allows us to not only interrogate each character's decision making, but also possibly peer into the original author's conception of the characters' world. We demonstrate our proposed technique on Shakespeare's famous Romeo and Juliet. We conclude with a discussion of how our analysis could be replicated in broader contexts, including real-life scenarios.
- North America > United States > New York > New York County > New York City (0.15)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
Imperfect-Recall Games: Equilibrium Concepts and Their Complexity
Tewolde, Emanuel, Zhang, Brian Hu, Oesterheld, Caspar, Zampetakis, Manolis, Sandholm, Tuomas, Goldberg, Paul W., Conitzer, Vincent
We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings across three different solution concepts: Nash, multiselves based on evidential decision theory (EDT), and multiselves based on causal decision theory (CDT). We are interested in both exact and approximate solution computation. As special cases, we consider (1) single-player games, (2) two-player zero-sum games and relationships to maximin values, and (3) games without exogenous stochasticity (chance nodes). We relate these problems to the complexity classes P, PPAD, PLS, $\Sigma_2^P$ , $\exists$R, and $\exists \forall$R.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (3 more...)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Information Technology > Artificial Intelligence > Machine Learning (0.67)
The Ludii Game Description Language is Universal
Soemers, Dennis J. N. J., Piette, Éric, Stephenson, Matthew, Browne, Cameron
There are several different game description languages (GDLs), each intended to allow wide ranges of arbitrary games (i.e., general games) to be described in a single higher-level language than general-purpose programming languages. Games described in such formats can subsequently be presented as challenges for automated general game playing agents, which are expected to be capable of playing any arbitrary game described in such a language without prior knowledge about the games to be played. The language used by the Ludii general game system was previously shown to be capable of representing equivalent games for any arbitrary, finite, deterministic, fully observable extensive-form game. In this paper, we prove its universality by extending this to include finite non-deterministic and imperfect-information games.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)
Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning
Hayes, Conor F., Reymond, Mathieu, Roijers, Diederik M., Howley, Enda, Mannion, Patrick
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Therefore, we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time by taking both the future and accrued returns into consideration. In this paper, we propose two novel Monte Carlo tree search algorithms. Firstly, we present a Monte Carlo tree search algorithm that can compute policies for nonlinear utility functions (NLU-MCTS) by optimising the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Secondly, we propose a distributional Monte Carlo tree search algorithm (DMCTS) which extends NLU-MCTS. DMCTS computes an approximate posterior distribution over the utility of the returns, and utilises Thompson sampling during planning to compute policies in risk-aware and multi-objective settings. Both algorithms outperform the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- Asia > Singapore (0.04)
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- (8 more...)
- Energy (1.00)
- Leisure & Entertainment > Games (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
A model-based approach to meta-Reinforcement Learning: Transformers and tree search
Pinon, Brieuc, Delvenne, Jean-Charles, Jungers, Raphaël
Meta-learning is a line of research that develops the ability to leverage past experiences to efficiently solve new learning problems. Meta-Reinforcement Learning (meta-RL) methods demonstrate a capability to learn behaviors that efficiently acquire and exploit information in several meta-RL problems. In this context, the Alchemy benchmark has been proposed by Wang et al. [2021]. Alchemy features a rich structured latent space that is challenging for state-of-the-art model-free RL methods. These methods fail to learn to properly explore then exploit. We develop a model-based algorithm. We train a model whose principal block is a Transformer Encoder to fit the symbolic Alchemy environment dynamics. Then we define an online planner with the learned model using a tree search method. This algorithm significantly outperforms previously applied model-free RL methods on the symbolic Alchemy problem. Our results reveal the relevance of model-based approaches with online planning to perform exploration and exploitation successfully in meta-RL. Moreover, we show the efficiency of the Transformer architecture to learn complex dynamics that arise from latent spaces present in meta-RL problems.
- Research Report > Promising Solution (0.34)
- Research Report > New Finding (0.34)
- Education (0.48)
- Leisure & Entertainment > Games (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice
Hammond, Lewis, Fox, James, Everitt, Tom, Abate, Alessandro, Wooldridge, Michael
Multi-agent influence diagrams (MAIDs) are a popular form of Previous work on MAIDs has focussed on Nash equilibria as graphical model that, for certain classes of games, have been shown the core solution concept [20]. Whilst this is arguably the most important to offer key complexity and explainability advantages over traditional solution concept in non-cooperative game theory, if there extensive form game (EFG) representations. In this paper, we are many Nash equilibria we often wish to remove some of those extend previous work on MAIDs by introducing the concept of a that are less'rational'. Many refinements to the Nash equilibrium MAID subgame, as well as subgame perfect and trembling hand have been proposed [17], with two of the most important being perfect equilibrium refinements. We then prove several equivalence subgame perfect Nash equilibria [26] and trembling hand perfect results between MAIDs and EFGs. Finally, we describe an open equilibria [27]. The first rules out'non-credible' threats and the second source implementation for reasoning about MAIDs and computing requires that each player is still playing a best-response when their equilibria.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (3 more...)
- Leisure & Entertainment > Games (0.66)
- Government > Military (0.46)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.85)
Reasoning with Contextual Knowledge and Influence Diagrams
Influence diagrams (IDs) are well-known formalisms extending Bayesian networks to model decision situations under uncertainty. Although they are convenient as a decision theoretic tool, their knowledge representation ability is limited in capturing other crucial notions such as logical consistency. We complement IDs with the light-weight description logic (DL) EL to overcome such limitations. We consider a setup where DL axioms hold in some contexts, yet the actual context is uncertain. The framework benefits from the convenience of using DL as a domain knowledge representation language and the modelling strength of IDs to deal with decisions over contexts in the presence of contextual uncertainty. We define related reasoning problems and study their computational complexity.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Causally Correct Partial Models for Reinforcement Learning
Rezende, Danilo J., Danihelka, Ivo, Papamakarios, George, Ke, Nan Rosemary, Jiang, Ray, Weber, Theophane, Gregor, Karol, Merzic, Hamza, Viola, Fabio, Wang, Jane, Mitrovic, Jovana, Besse, Frederic, Antonoglou, Ioannis, Buesing, Lars
In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent's next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this paper, we show that partial models can be causally incorrect: they are confounded by the observations they don't model, and can therefore lead to incorrect planning. To address this, we introduce a general family of partial models that are provably causally correct, yet remain fast because they do not need to fully model future observations.
- Research Report (0.64)
- Overview (0.46)
Learning Task Specifications from Demonstrations via the Principle of Maximum Causal Entropy
Vazquez-Chanlatte, Marcell, Seshia, Sanjit A.
In many settings (e.g., robotics) demonstrations provide a natural way to specify sub-tasks; however, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely composed and/or do not explicitly capture history dependencies. Motivated by this deficit, recent works have proposed specializing to task specifications, a class of Boolean non-Markovian rewards which admit well-defined composition and explicitly handle historical dependencies. This work continues this line of research by adapting maximum causal entropy inverse reinforcement learning to estimate the posteriori probability of a specification given a multi-set of demonstrations. The key algorithmic insight is to leverage the extensive literature and tooling on reduced ordered binary decision diagrams to efficiently encode a time unrolled Markov Decision Process.
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)