Maastricht University
Influence-Based Abstraction for Multiagent Systems
Oliehoek, Frans Adriaan (Maastricht University) | Witwicki, Stefan J. (INESC-ID) | Kaelbling, Leslie Pack (Massachusetts Institute of Technology)
This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influence-based abstraction substantially generalizes previous work on exploiting weakly-coupled agent interaction structures. Therein lie several important contributions. First, our general formulation sheds new light on the theoretical relationships among previous approaches, and promotes future empirical comparisons that could come by extending them beyond the more specific problem contexts for which they were developed. More importantly, the influence-based approaches that we generalize have shown promising improvements in the scalability of planning for more restrictive models. Thus, our theoretical result here serves as the foundation for practical algorithms that we anticipate will bring similar improvements to more general planning contexts, and also into other domains such as approximate planning, decision-making in adversarial domains, and online learning.
Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication
Oliehoek, Frans Adriaan (Maastricht University) | Spaan, Matthijs T. J. (Delft University of Technology)
Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is delayed by at most one time step. This model allows agents to act on their most recent (private) observation. Such an assumption is a strict generalization over having agents wait until the global information is available and is more appropriate for applications in which response time is critical. In this setting, however, value function backups are significantly more costly, and naive application of incremental pruning, the core of many state-of-the-art optimal POMDP techniques, is intractable. In this paper, we overcome this problem by demonstrating that computation of the MPOMDP-DC backup can be structured as a tree and introducing two novel tree-based pruning techniques that exploit this structure in an effective way. We experimentally show that these methods have the potential to outperform naive incremental pruning by orders of magnitude, allowing for the solution of larger problems.
FAQ-Learning in Matrix Games: Demonstrating Convergence Near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes
Kaisers, Michael (Maastricht University) | Tuyls, Karl (Maastricht University)
This article studies Frequency Adjusted Q-learning (FAQ-learning), a variation of Q-learning that simulates simultaneous value function updates. The main contributions are empirical and theoretical support for the convergence of FAQ-learning to attractors near Nash equilibria in two-agent two-action matrix games.The games can be divided into three types: Matching pennies, Prisoners' Dilemma and Battle of Sexes. This article shows that the Matching pennies and Prisoners' Dilemma yield one attractor of the learning dynamics, while the Battle of Sexes exhibits a supercritical pitchfork bifurcation at a critical temperature, where one attractor splits into two attractors and one repellent fixed point. Experiments illustrate that the distance between fixed points of the FAQ-learning dynamics and Nash equilibria tends to zero as the exploration parameter of FAQ-learning approaches zero.
Reports of the AAAI 2010 Conference Workshops
Aha, David W. (Naval Research Laboratory) | Boddy, Mark (Adventium Labs) | Bulitko, Vadim (University of Alberta) | Garcez, Artur S. d'Avila (City University London) | Doshi, Prashant (University of Georgia) | Edelkamp, Stefan (TZI, Bremen University) | Geib, Christopher (University of Edinburgh) | Gmytrasiewicz, Piotr (University of Illinois, Chicago) | Goldman, Robert P. (Smart Information Flow Technologies) | Hitzler, Pascal (Wright State University) | Isbell, Charles (Georgia Institute of Technology) | Josyula, Darsana (University of Maryland, College Park) | Kaelbling, Leslie Pack (Massachusetts Institute of Technology) | Kersting, Kristian (University of Bonn) | Kunda, Maithilee (Georgia Institute of Technology) | Lamb, Luis C. (Universidade Federal do Rio Grande do Sul (UFRGS)) | Marthi, Bhaskara (Willow Garage) | McGreggor, Keith (Georgia Institute of Technology) | Nastase, Vivi (EML Research gGmbH) | Provan, Gregory (University College Cork) | Raja, Anita (University of North Carolina, Charlotte) | Ram, Ashwin (Georgia Institute of Technology) | Riedl, Mark (Georgia Institute of Technology) | Russell, Stuart (University of California, Berkeley) | Sabharwal, Ashish (Cornell University) | Smaus, Jan-Georg (University of Freiburg) | Sukthankar, Gita (University of Central Florida) | Tuyls, Karl (Maastricht University) | Meyden, Ron van der (University of New South Wales) | Halevy, Alon (Google, Inc.) | Mihalkova, Lilyana (University of Maryland) | Natarajan, Sriraam (University of Wisconsin)
The AAAI-10 Workshop program was held Sunday and Monday, July 11–12, 2010 at the Westin Peachtree Plaza in Atlanta, Georgia. The AAAI-10 workshop program included 13 workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Fun, Bridging the Gap between Task and Motion Planning, Collaboratively-Built Knowledge Sources and Artificial Intelligence, Goal-Directed Autonomy, Intelligent Security, Interactive Decision Theory and Game Theory, Metacognition for Robust Social Systems, Model Checking and Artificial Intelligence, Neural-Symbolic Learning and Reasoning, Plan, Activity, and Intent Recognition, Statistical Relational AI, Visual Representations and Reasoning, and Abstraction, Reformulation, and Approximation. This article presents short summaries of those events.
Reports of the AAAI 2010 Conference Workshops
Aha, David W. (Naval Research Laboratory) | Boddy, Mark (Adventium Labs) | Bulitko, Vadim (University of Alberta) | Garcez, Artur S. d' (City University London) | Avila (University of Georgia) | Doshi, Prashant (TZI, Bremen University) | Edelkamp, Stefan (University of Edinburgh) | Geib, Christopher (University of Illinois, Chicago) | Gmytrasiewicz, Piotr (Smart Information Flow Technologies) | Goldman, Robert P. (Wright State University) | Hitzler, Pascal (Georgia Institute of Technology) | Isbell, Charles (University of Maryland, College Park) | Josyula, Darsana (Massachusetts Institute of Technology) | Kaelbling, Leslie Pack (University of Bonn) | Kersting, Kristian (Georgia Institute of Technology) | Kunda, Maithilee (Universidade Federal do Rio Grande do Sul (UFRGS)) | Lamb, Luis C. (Willow Garage) | Marthi, Bhaskara (Georgia Institute of Technology) | McGreggor, Keith (EML Research gGmbH) | Nastase, Vivi (University College Cork) | Provan, Gregory (University of North Carolina, Charlotte) | Raja, Anita (Georgia Institute of Technology) | Ram, Ashwin (Georgia Institute of Technology) | Riedl, Mark (University of California, Berkeley) | Russell, Stuart (Cornell University) | Sabharwal, Ashish (University of Freiburg) | Smaus, Jan-Georg (University of Central Florida) | Sukthankar, Gita (Maastricht University) | Tuyls, Karl (University of New South Wales) | Meyden, Ron van der (Google, Inc.) | Halevy, Alon (University of Maryland) | Mihalkova, Lilyana (University of Wisconsin) | Natarajan, Sriraam
The AAAI-10 Workshop program was held Sunday and Monday, July 11–12, 2010 at the Westin Peachtree Plaza in Atlanta, Georgia. The AAAI-10 workshop program included 13 workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Fun, Bridging the Gap between Task and Motion Planning, Collaboratively-Built Knowledge Sources and Artificial Intelligence, Goal-Directed Autonomy, Intelligent Security, Interactive Decision Theory and Game Theory, Metacognition for Robust Social Systems, Model Checking and Artificial Intelligence, Neural-Symbolic Learning and Reasoning, Plan, Activity, and Intent Recognition, Statistical Relational AI, Visual Representations and Reasoning, and Abstraction, Reformulation, and Approximation. This article presents short summaries of those events.
Integrating Opponent Models with Monte-Carlo Tree Search in Poker
Ponsen, Marc (Maastricht University) | Gerritsen, Geert (Maastricht University) | Chaslot, Guillaume (Maastricht University)
In this paper we apply a Monte-Carlo Tree Search implementation that is boosted with domain knowledge to the game of poker. More specifically, we integrate an opponent model in the Monte-Carlo Tree Search algorithm to produce a strong poker playing program. Opponent models allow the search algorithm to focus on relevant parts of the game-tree. We use an opponent modelling approach that starts from a (learned) prior, i.e., general expectations about opponent behavior, and then learns a relational regression tree-function that adapts these priors to specific opponents. Our modelling approach can generate detailed game features or relations on-the-fly. Additionally, using a prior we can already make reasonable predictions even when limited experience is available for a particular player. We show that Monte-Carlo Tree Search with integrated opponent models performs well against state-of-the-art poker programs.
MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling
Ponsen, Marc (Maastricht University) | Lanctot, Marc (University of Alberta) | Jong, Steven de (Maastricht University)
This paper presents a sample-based algorithm for the computation of restricted Nash strategies in complex extensive form games. Recent work indicates that regret-minimization algorithms using selective sampling, such as Monte-Carlo Counterfactual Regret Minimization (MCCFR), converge faster to Nash-equilibrium (NE) strategies than their non-sampled counterparts which perform a full tree traversal. In this paper, we show that MCCFR is also able to establish NE strategies in the complex domain of Poker. Although such strategies are defensive (i.e. safe to play), they are oblivious to opponent mistakes. We can thus achieve better performance by using (an estimation of) opponent strategies. The Restricted Nash Response (RNR) algorithm was proposed to learn robust counter-strategies given such knowledge. It solves a modified game, wherein it is assumed that opponents play according to a fixed strategy with a certain probability, or to a regret-minimizing strategy otherwise. We improve the rate of convergence of the RNR algorithm using sampling. Our new algorithm, MCRNR, samples only relevant parts of the game tree. It is therefore able to converge faster to robust best-response strategies than RNR.We evaluate our algorithm on a variety of imperfect information games that are small enough to solve yet large enough to be strategically interesting, as well as a large game, Texas Hold’em Poker.