Eric B. Baum 1 NEC Research Institute, 4 Independence Way, Princeton NJ 08540 eric@research.NJ.NEC.COM Abstract The point of game tree search is to insulate oneself from errors in the evaluation function. The standard approach is to grow a full width tree as deep as time allows, and then value the tree as if the leaf evaluations were exact. This has been effective in many games because of the computational efficiency of the alpha-beta algorithm. A Bayesian would suggest instead to train a model of one's uncertainty. This model adds extra information in addition to the standard evaluation function. Within such a formal model, there is an optimal tree growth procedure and an optimal method of valueing the tree. We describe how to optimally value the tree, and how to approximate on line the optimal tree to search.
Bard, Nolan, Foerster, Jakob N., Chandar, Sarath, Burch, Neil, Lanctot, Marc, Song, H. Francis, Parisotto, Emilio, Dumoulin, Vincent, Moitra, Subhodeep, Hughes, Edward, Dunning, Iain, Mourad, Shibl, Larochelle, Hugo, Bellemare, Marc G., Bowling, Michael
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay and imperfect information in a two to five player setting. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques capable of imbuing artificial agents with such theory of mind will not only be crucial for their success in Hanabi, but also in broader collaborative efforts, and especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.
Min, Wookhee (North Carolina State University) | Baikadi, Alok (University of Pittsburgh) | Mott, Bradford (North Carolina State University) | Rowe, Jonathan (North Carolina State University) | Liu, Barry (North Carolina State University) | Ha, Eun Young (IBM) | Lester, James (North Carolina State University)
Recent years have seen a growing interest in player modeling, which supports the creation of player-adaptive digital games. A central problem of player modeling is goal recognition, which aims to recognize players’ intentions from observable gameplay behaviors. Player goal recognition offers the promise of enabling games to dynamically adjust challenge levels, perform procedural content generation, and create believable NPC interactions. A growing body of work is investigating a wide range of machine learning-based goal recognition models. In this paper, we introduce GOALIE, a multidimensional framework for evaluating player goal recognition models. The framework integrates multiple metrics for player goal recognition models, including two novel metrics, n-early convergence rate and standardized convergence point . We demonstrate the application of the GOALIE framework with the evaluation of several player goal recognition models, including Markov logic network-based, deep feedforward neural network-based, and long short-term memory network-based goal recognizers on two different educational games. The results suggest that GOALIE effectively captures goal recognition behaviors that are key to next-generation player modeling.
This paper presents a constraint optimization approach to walling in real-time strategy (RTS) games. Walling is a specific type of spatial reasoning, typically employed by human expert players and not currently fully exploited in RTS game AI, consisting on finding configurations of buildings to completely or partially block paths. Our approach is based on local search, and is specifically designed for the real-time nature of RTS games. We present experiments in the context of the RTS game StarCraft showing promising results.
Michael Bowling and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh PA, 15213-3891 Abstract Stochastic games are a general model of interaction between multiple agents. They have recently been the focus of a great deal of research in reinforcement learning as they are both descriptive and have a well-defined Nash equilibrium solution. Most of this recent work, although very general, has only been applied to small games with at most hundreds of states. On the other hand, there are landmark results of learning being successfully applied to specific large and complex games such as Checkers and Backgammon. In this paper we describe a scalable learning algorithm for stochastic games, that combines three separate ideas from reinforcement learning into a single algorithm. These ideas are tile coding for generalization, policy gradient ascent as the basic learning method, and our previous work on the WoLF ("Win or Learn Fast") variable learning rate to encourage convergence. We apply this algorithm to the intractably sized game-theoretic card game Goofspiel, showing preliminary results of learning in self-play. We demonstrate that policy gradient ascent can learn even in this highly non-stationary problem with simultaneous learning. We also show that the WoLF principle continues to have a converging effect even in large problems with approximation and generalization. Introduction We are interested in the problem of learning in multiagent environments. One of the main challenges with these environments is that other agents in the environment may be learning and adapting as well.