Undirected Networks
Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
Killian, Taylor W. (Harvard University) | Konidaris, George (Brown University) | Doshi-Velez, Finale (Harvard University)
An intriguing application of transfer learning emerges when tasks arise with similar, but not identical, dynamics. Hidden Parameter Markov Decision Processes (HiP-MDP) embed these tasks into a low-dimensional space; given the embedding parameters one can identify the MDP for a particular task. However, the original formulation of HiP-MDP had a critical flaw: the embedding uncertainty was modeled independently of the agent's state uncertainty, requiring an arduous training procedure. In this work, we apply a Gaussian Process latent variable model to jointly model the dynamics and the embedding, leading to a more elegant formulation, one that allows for better uncertainty quantification and thus more robust transfer.
Model AI Assignments 2017
Neller, Todd W. (Gettysburg College) | Eckroth, Joshua (Stetson University) | Reddy, Sravana (Wellesley College) | Ziegler, Joshua (Air Force Institute of Technology) | Bindewald, Jason (Air Force Institute of Technology) | Peterson, Gilbert (Air Force Institute of Technology) | Way, Thomas (Villanova University) | Matuszek, Paula (Villanova University) | Cassel, Lillian (Villanova University) | Papalaskari, Mary-Angela (Villanova University) | Weiss, Carol (Villanova University) | Anders, Ariel (Massachusetts Institute of Technology) | Karaman, Sertac (Massachusetts Institute of Technology)
The Model AI Assignments session seeks to gather and disseminate the best assignment designs of the Artificial Intelligence (AI) Education community. Recognizing that assignments form the core of student learning experience, we here present abstracts of six AI assignments from the 2017 session that are easily adoptable, playfully engaging, and flexible for a variety of instructor needs.
Fast-Tracking Stationary MOMDPs for Adaptive Management Problems
Pรฉron, Martin (Queensland University of Technology, CSIRO) | Becker, Kai Helge (University of Strathclyde) | Bartlett, Peter (University of California, Berkeley) | Chadรจs, Iadine (Commonwealth Scientific and Industrial Research Organisation)
Adaptive management is applied in conservation and natural resource management, and consists of making sequential decisions when the transition matrix is uncertain. Informally described as โlearning by doingโ, this approach aims to trade off between decisions that help achieve the objective and decisions that will yield a better knowledge of the true transition matrix. When the true transition matrix is assumed to be an element of a finite set of possible matrices, solving a mixed observability Markov decision process (MOMDP) leads to an optimal trade-off but is very computationally demanding. Under the assumption (common in adaptive management) that the true transition matrix is stationary, we propose a polynomial-time algorithm to find a lower bound of the value function. In the corners of the domain of the value function (belief space), this lower bound is provably equal to the optimal value function. We also show that under further assumptions, it is a linear approximation of the optimal value function in a neighborhood around the corners. We evaluate the benefits of our approach by using it to initialize the solvers MO-SARSOP and Perseus on a novel computational sustainability problem and a recent adaptive management data challenge. Our approach leads to an improved initial value function and translates into significant computational gains for both solvers.
Three New Algorithms to Solve N-POMDPs
Dujardin, Yann (Commonwealth Scientific and Industrial Research Organisation (CSIRO)) | Dietterich, Tom (Oregon State University) | Chadรจs, Iadine (Commonwealth Scientific and Industrial Research Organisation (CSIRO))
In many fields in computational sustainability, applications of POMDPs are inhibited by the complexity of the optimal solution. One way of delivering simple solutions is to represent the policy with a small number of alpha-vectors. We would like to find the best possible policy that can be expressed using a fixed number N of alpha-vectors. We call this the N-POMDP problem. The existing solver alpha-min approximately solves finite-horizon POMDPs with a controllable number of alpha-vectors. However alpha-min is a greedy algorithm without performance guarantees, and it is rather slow. This paper proposes three new algorithms, based on a general approach that we call alpha-min-2. These three algorithms are able to approximately solve N-POMDPs. Alpha-min-2-fast (heuristic) and alpha-min-2-p (with performance guarantees) are designed to complement an existing POMDP solver, while alpha-min-2-solve (heuristic) is a solver itself. Complexity results are provided for each of the algorithms, and they are tested on well-known benchmarks. These new algorithms will help users to interpret solutions to POMDP problems in computational sustainability.
Matrix Factorisation for Scalable Energy Breakdown
Batra, Nipun (IIIT Delhi) | Wang, Hongning (University of Virginia) | Singh, Amarjeet (IIIT Delhi) | Whitehouse, Kamin (University of Virginia)
Homes constitute more than one-thirds of the total energy consumption. Producing an energy breakdown for a home has been shown to reduce household energy consumption by up to 15%, among other benefits. However, existing approaches to produce an energy breakdown require hardware to be installed in each home and are thus prohibitively expensive. In this paper, we propose a novel application of feature-based matrix factorisation that does not require any additional hard- ware installation. The basic premise of our approach is that common design and construction patterns for homes create a repeating structure in their energy data. Thus, a sparse basis can be used to represent energy data from a broad range of homes. We evaluate our approach on 516 homes from a publicly available data set and find it to be more effective than five baseline approaches that either require sensing in each home, or a very rigorous survey across a large number of homes coupled with complex modelling. We also present a deployment of our system as a live web application that can potentially provide energy breakdown to millions of homes.
When Does Bounded-Optimal Metareasoning Favor Few Cognitive Systems?
Milli, Smitha (University of California, Berkeley) | Lieder, Falk (University of California, Berkeley) | Griffiths, Thomas L. (University of California, Berkeley)
While optimal metareasoning is notoriously intractable, humans are nonetheless able to adaptively allocate their computational resources. A possible approximation that humans may use to do this is to only metareason over a finite set of cognitive systems that perform variable amounts of computation. The highly influential "dual-process" accounts of human cognition, which postulate the coexistence of a slow accurate system with a fast error-prone system, can be seen as a special case of this approximation. This raises two questions: how many cognitive systems should a bounded optimal agent be equipped with and what characteristics should those systems have? We investigate these questions in two settings: a one-shot decision between two alternatives, and planning under uncertainty in a Markov decision process. We find that the optimal number of systems depends on the variability of the environment and the costliness of metareasoning. Consistent with dual-process theories, we also find that when having two systems is optimal, then the first system is fast but error-prone and the second system is slow but accurate.
Leveraging Saccades to Learn Smooth Pursuit: A Self-Organizing Motion Tracking Model Using Restricted Boltzmann Machines
Yogeswaran, Arjun (University of Ottawa) | Payeur, Pierre (University of Ottawa)
In this paper, we propose a biologically-plausible model to explain the emergence of motion tracking behaviour in early development using unsupervised learning. The model's training is biased by a concept called retinal constancy, which measures how similar visual contents are between successive frames. This biasing is similar to a reward in reinforcement learning, but is less explicit, as it modulates the model's learning rate instead of being a learning signal itself. The model is a two-layer deep network. The first layer learns to encode visual motion, and the second layer learns to relate that motion to gaze movements, which it perceives and creates through bi-directional nodes. By randomly generating gaze movements to traverse the local visual space, desirable correlations are developed between visual motion and the appropriate gaze to nullify that motion such that maximal retinal constancy is achieved. Biologically, this is similar to using saccades to look around and learning from moments where a target and the saccade move together such that the image stays the same on the retina, and developing smooth pursuit behaviour to perform this action in the future. Restricted Boltzmann machines are used to implement this model because they can form a deep belief network, perform online learning, and act generatively. These properties all have biological equivalents and coincide with the biological plausibility of using saccades as leverage to learn smooth pursuit. This method is unique because it uses general machine learning algorithms, and their inherent generative properties, to learn from real-world data. It also implements a biological theory, uses motion instead of recognition via local searches, without temporal filtering, and learns in a fully unsupervised manner. Its tracking performance after being trained on real-world images with simulated motion is compared to its tracking performance after being trained on natural video. Results show that this model is able to successfully follow targets in natural video, despite partial occlusions, scale changes, and nonlinear motion.
RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem
Shekarpour, Saeedeh (Wright State University) | Marx, Edgard (Universitรคt Leipzig) | Auer, Sรถren (University of Bonn) | Sheth, Amit (Wright State University)
For non-expert users, a textual query is the most popular and simple means for communicating with a retrieval or question answering system.However, there is a risk of receiving queries which do not match with the background knowledge.Query expansion and query rewriting are solutions for this problem but they are in danger of potentially yielding a large number of irrelevant words, which in turn negatively influences runtime as well as accuracy.In this paper, we propose a new method for automatic rewriting input queries on graph-structured RDF knowledge bases.We employ a Hidden Markov Model to determine the most suitable derived words from linguistic resources.We introduce the concept of triple-based co-occurrence for recognizing co-occurred words in RDF data.This model was bootstrapped with three statistical distributions.Our experimental study demonstrates the superiority of the proposed approach to the traditional n-gram model.
Dynamically Constructed (PO)MDPs for Adaptive Robot Planning
Zhang, Shiqi (Cleveland State University) | Khandelwal, Piyush (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
To operate in human-robot coexisting environments, intelligent robots need to simultaneously reason with commonsense knowledge and plan under uncertainty. Markov decision processes (MDPs) and partially observable MDPs (POMDPs), are good at planning under uncertainty toward maximizing long-term rewards; P-LOG, a declarative programming language under Answer Set semantics, is strong in commonsense reasoning. In this paper, we present a novel algorithm called iCORPP to dynamically reason about, and construct (PO)MDPs using P-LOG. iCORPP successfully shields exogenous domain attributes from (PO)MDPs, which limits computational complexity and enables (PO)MDPs to adapt to the value changes these attributes produce. We conduct a number of experimental trials using two example problems in simulation and demonstrate iCORPP on a real robot. Results show significant improvements compared to competitive baselines.
I See What You See: Inferring Sensor and Policy Models of Human Real-World Motor Behavior
Schmitt, Felix (Robert Bosch GmbH) | Bieg, Hans-Joachim (Robert Bosch GmbH) | Herman, Michael (Robert Bosch GmbH) | Rothkopf, Constantin A. (Technical University Darmstadt)
Human motor behavior is naturally guided by sensing the environment. To predict such sensori-motor behavior, it is necessary to model what is sensed and how actions are chosen based on the obtained sensory measurements. Although several models of human sensing haven been proposed, rarely data of the assumed sensory measurements is available. This makes statistical estimation of sensor models problematic. To overcome this issue, we propose an abstract structural estimation approach building on the ideas of Herman et al.'s Simultaneous Estimation of Rewards and Dynamics (SERD). Assuming optimal fusion of sensory information and rational choice of actions the proposed method allows to infer sensor models even in absence of data of the sensory measurements. To the best of our knowledge, this work presents the first general approach for joint inference of sensor and policy models. Furthermore, we consider its concrete implementation in the important class of sensor scheduling linear quadratic Gaussian problems. Finally, the effectiveness of the approach is demonstrated for prediction of the behavior of automobile drivers. Specifically, we model the glance and steering behavior of driving in the presence of visually demanding secondary tasks. The results show, that prediction benefits from the inference of sensor models. This is the case, especially, if also information is considered, that is contained in gaze switching behavior.