Learning Graphical Models
Optimizing Expectation with Guarantees in POMDPs
Chatterjee, Krishnendu (The Institute of Science and Technology Austria) | Novotný, Petr (The Institute of Science and Technology Austria) | Pérez, Guillermo A. (Université Libre de Bruxelles) | Raskin, Jean-François (Université Libre de Bruxelles) | Žikelić, Đorđe (University of Cambridge)
A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the “expectation” and “threshold” approaches and consider a “guaranteed payoff optimization (GPO)” problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that a) each possible outcome of σ yields a discounted-sum payoff of at least t, and b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks.
Decentralized Planning in Stochastic Environments with Submodular Rewards
Kumar, Rajiv Ranjan (Singapore Management University) | Varakantham, Pradeep (Singapore Management University) | Kumar, Akshat (Singapore Management University)
Decentralized Markov Decision Process (Dec-MDP) provides a rich framework to represent cooperative decentralized and stochastic planning problems under transition uncertainty. However, solving a Dec-MDP to generate coordinated yet decentralized policies is NEXP-Hard. Researchers have made significant progress in providing approximate approaches to improve scalability with respect to number of agents. However, there has been little or no research devoted to finding guarantees on solution quality for approximate approaches considering multiple (more than 2 agents) agents. We have a similar situation with respect to the competitive decentralized planning problem and the Stochastic Game (SG) model. To address this, we identify models in the cooperative and competitive case that rely on submodular rewards, where we show that existing approximate approaches can provide strong quality guarantees ( a priori, and for cooperative case also posteriori guarantees). We then provide solution approaches and demonstrate improved online guarantees on benchmark problems from the literature for the cooperative case.
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation
Wang, Jinzhuo (Peking University) | Wang, Wenmin (Peking University ) | Wang, Ronggang (Peking University) | Gao, Wen (Peking University)
Monte Carlo tree search (MCTS) is extremely popular in computer Go which determines each action by enormous simulations in a broad and deep search tree. However, human experts select most actions by pattern analysis and careful evaluation rather than brute search of millions of future interactions. In this paper, we propose a computer Go system that follows experts’ way of thinking and playing. Our system consists of two parts. The first part is a novel deep alternative neural network (DANN) used to generate candidates of next move. Compared with existing deep convolutional neural network (DCNN), DANN inserts recurrent layer after each convolutional layer and stacks them in an alternative manner. We show such setting can preserve more contexts of local features and its evolutions which are beneficial for move prediction. The second part is a long-term evaluation (LTE) module used to provide a reliable evaluation of candidates rather than a single probability from move predictor. This is consistent with human experts’ nature of playing since they can foresee tens of steps to give an accurate estimation of candidates. In our system, for each candidate, LTE calculates a cumulative reward after several future interactions when local variations are settled. Combining criteria from the two parts, our system determines the optimal choice of next move. For more comprehensive experiments, we introduce a new professional Go dataset (PGD), consisting of $253,233$ professional records. Experiments on GoGoD and PGD datasets show the DANN can substantially improve performance of move prediction over pure DCNN. When combining LTE, our system outperforms most relevant approaches and open engines based on MCTS.
Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
Killian, Taylor W. (Harvard University) | Konidaris, George (Brown University) | Doshi-Velez, Finale (Harvard University)
An intriguing application of transfer learning emerges when tasks arise with similar, but not identical, dynamics. Hidden Parameter Markov Decision Processes (HiP-MDP) embed these tasks into a low-dimensional space; given the embedding parameters one can identify the MDP for a particular task. However, the original formulation of HiP-MDP had a critical flaw: the embedding uncertainty was modeled independently of the agent's state uncertainty, requiring an arduous training procedure. In this work, we apply a Gaussian Process latent variable model to jointly model the dynamics and the embedding, leading to a more elegant formulation, one that allows for better uncertainty quantification and thus more robust transfer.
A Deep Learning Approach for Arabic Caption Generation Using Roots-Words
Jindal, Vasu (University of Texas)
Automatic caption generation is a key research field in the machine learning community. However, most of the current research is performed on English caption generation ignoring other languages like Arabic and Persian. In this paper, we propose a novel technique leveraging the heavy influence of root words in Arabic to automatically generate captions in Arabic. Fragments of the images are associated with root words and deep belief network pre-trained using Restricted Boltzmann Machines are used to extract words associated with image. Finally, dependency tree relations are used to generate sentence-captions by using the dependency on root words. Our approach is robust and attains BLEU-1 score of 34.8.
Semantic Inference of Bird Songs Using Dynamic Bayesian Networks
Daimon, Keisuke (Nagoya University) | Hedley, Richard W. ( University of California, Los Angeles ) | Taylor, Charles E. (University of California, Los Angeles)
Knowledge representation and natural language processing are core interests to the field of artificial intelligence (AI). While most research has been directed toward machines and humans, the principles and methods developed for AI might be extended to other species as well. Birds frequently behave in a manner that is intelligent and convey information in their vocalizations that is meaningful to others. In this paper we report on a method combining clustering and dynamic Bayesian networks to describe the semantics of songs among Cassin’s Vireos (Vireo cassinii), and show how behavioral contexts possibly affect bird song output.
Model AI Assignments 2017
Neller, Todd W. (Gettysburg College) | Eckroth, Joshua (Stetson University) | Reddy, Sravana (Wellesley College) | Ziegler, Joshua (Air Force Institute of Technology) | Bindewald, Jason (Air Force Institute of Technology) | Peterson, Gilbert (Air Force Institute of Technology) | Way, Thomas (Villanova University) | Matuszek, Paula (Villanova University) | Cassel, Lillian (Villanova University) | Papalaskari, Mary-Angela (Villanova University) | Weiss, Carol (Villanova University) | Anders, Ariel (Massachusetts Institute of Technology) | Karaman, Sertac (Massachusetts Institute of Technology)
The Model AI Assignments session seeks to gather and disseminate the best assignment designs of the Artificial Intelligence (AI) Education community. Recognizing that assignments form the core of student learning experience, we here present abstracts of six AI assignments from the 2017 session that are easily adoptable, playfully engaging, and flexible for a variety of instructor needs.
Fast-Tracking Stationary MOMDPs for Adaptive Management Problems
Péron, Martin (Queensland University of Technology, CSIRO) | Becker, Kai Helge (University of Strathclyde) | Bartlett, Peter (University of California, Berkeley) | Chadès, Iadine (Commonwealth Scientific and Industrial Research Organisation)
Adaptive management is applied in conservation and natural resource management, and consists of making sequential decisions when the transition matrix is uncertain. Informally described as ’learning by doing’, this approach aims to trade off between decisions that help achieve the objective and decisions that will yield a better knowledge of the true transition matrix. When the true transition matrix is assumed to be an element of a finite set of possible matrices, solving a mixed observability Markov decision process (MOMDP) leads to an optimal trade-off but is very computationally demanding. Under the assumption (common in adaptive management) that the true transition matrix is stationary, we propose a polynomial-time algorithm to find a lower bound of the value function. In the corners of the domain of the value function (belief space), this lower bound is provably equal to the optimal value function. We also show that under further assumptions, it is a linear approximation of the optimal value function in a neighborhood around the corners. We evaluate the benefits of our approach by using it to initialize the solvers MO-SARSOP and Perseus on a novel computational sustainability problem and a recent adaptive management data challenge. Our approach leads to an improved initial value function and translates into significant computational gains for both solvers.
Three New Algorithms to Solve N-POMDPs
Dujardin, Yann (Commonwealth Scientific and Industrial Research Organisation (CSIRO)) | Dietterich, Tom (Oregon State University) | Chadès, Iadine (Commonwealth Scientific and Industrial Research Organisation (CSIRO))
In many fields in computational sustainability, applications of POMDPs are inhibited by the complexity of the optimal solution. One way of delivering simple solutions is to represent the policy with a small number of alpha-vectors. We would like to find the best possible policy that can be expressed using a fixed number N of alpha-vectors. We call this the N-POMDP problem. The existing solver alpha-min approximately solves finite-horizon POMDPs with a controllable number of alpha-vectors. However alpha-min is a greedy algorithm without performance guarantees, and it is rather slow. This paper proposes three new algorithms, based on a general approach that we call alpha-min-2. These three algorithms are able to approximately solve N-POMDPs. Alpha-min-2-fast (heuristic) and alpha-min-2-p (with performance guarantees) are designed to complement an existing POMDP solver, while alpha-min-2-solve (heuristic) is a solver itself. Complexity results are provided for each of the algorithms, and they are tested on well-known benchmarks. These new algorithms will help users to interpret solutions to POMDP problems in computational sustainability.
Matrix Factorisation for Scalable Energy Breakdown
Batra, Nipun (IIIT Delhi) | Wang, Hongning (University of Virginia) | Singh, Amarjeet (IIIT Delhi) | Whitehouse, Kamin (University of Virginia)
Homes constitute more than one-thirds of the total energy consumption. Producing an energy breakdown for a home has been shown to reduce household energy consumption by up to 15%, among other benefits. However, existing approaches to produce an energy breakdown require hardware to be installed in each home and are thus prohibitively expensive. In this paper, we propose a novel application of feature-based matrix factorisation that does not require any additional hard- ware installation. The basic premise of our approach is that common design and construction patterns for homes create a repeating structure in their energy data. Thus, a sparse basis can be used to represent energy data from a broad range of homes. We evaluate our approach on 516 homes from a publicly available data set and find it to be more effective than five baseline approaches that either require sensing in each home, or a very rigorous survey across a large number of homes coupled with complex modelling. We also present a deployment of our system as a live web application that can potentially provide energy breakdown to millions of homes.