Nguyen, Truong-Huy Dinh (National University of Singapore) | Hsu, David (National University of Singapore) | Lee, Wee-Sun (National University of Singapore) | Leong, Tze-Yun (National University of Singapore) | Kaelbling, Leslie Pack (Massachusetts Institute of Technology) | Lozano-Perez, Tomas (Massachusetts Institute of Technology) | Grant, Andrew Haydn (Singapore-MIT GAMBIT Game Lab)
We apply decision theoretic techniques to construct non-player characters that are able to assist a human player in collaborative games. The method is based on solving Markov decision processes, which can be difficult when the game state is described by many variables. To scale to more complex games, the method allows decomposition of a game task into subtasks, each of which can be modelled by a Markov decision process. Intention recognition is used to infer the subtask that the human is currently performing, allowing the helper to assist the human in performing the correct task. Experiments show that the method can be effective, giving near-human level performance in helping a human in a collaborative game.
This paper investigates methods for estimating the optimal stochastic control policy for a Markov Decision Process with unknown transition dynamics and an unknown reward function. This form of model-free reinforcement learning comprises many real world systems such as playing video games, simulated control tasks, and real robot locomotion. Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent. However, these methods are not guaranteed to find the optimal stochastic policy, and the high variance gradient estimates make convergence unstable. In order to resolve these problems, we propose a technique using Markov Chain Monte Carlo to generate samples from the posterior distribution of the parameters conditioned on being optimal. Our method provably converges to the globally optimal stochastic policy, and empirically similar variance compared to the policy gradient.
Opponents are characterized by a Bayesian network intended to guide Monte-Carlo Tree Search through the game tree of No-Limit Texas Hold'em Poker. By using a probabilistic model of opponents, the network is able to integrate all available sources of information, including the infrequent revelations of hidden beliefs. These revelations are biased, and as such are difficult to incorporate into action prediction. The proposed network mitigates this bias via the expectation maximization algorithm and a probabilistic characterization of the hidden variables that generate observations.
Beating humans in a penny-matching game by leveraging cognitive hierarchy theory and Bayesian learning Ran Tian, Nan Li, Ilya Kolmanovsky, and Anouck Girard Abstract -- It is a longstanding goal of artificial intelligence (AI) to be superior to human beings in decision making. Games are suitable for testing AI capabilities of making good decisions in non-numerical tasks. In this paper, we develop a new AI algorithm to play the penny-matching game considered in Shannon's "mind-reading machine" (1953) against human players. In particular, we exploit cognitive hierarchy theory and Bayesian learning techniques to continually evolve a model for predicting human player decisions, and let the AI player make decisions according to the model predictions to pursue the best chance of winning. Experimental results show that our AI algorithm beats 27 out of 30 volunteer human players.
Recent years have witnessed a growing interest in interactive narrative-centered virtual environments for education, training, and entertainment. Narrative environments dynamically craft engaging story-based experiences for users, who are themselves active participants in unfolding stories. A key challenge posed by interactive narrative is recognizing users' goals so that narrative planners can dynamically orchestrate plot elements and character actions to create rich, customized stories. In this paper we present an inductive approach to predicting users' goals by learning probabilistic goal recognition models. This approach has been evaluated in a narrative environment for the domain of microbiology in which the user plays the role of a medical detective solving a science mystery. An empirical evaluation of goal recognition based on n-gram models and Bayesian networks suggests that the models offer significant predictive power.