Undirected Networks
Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning
Training deep reinforcement learning agents on environments with multiple levels / scenes / conditions from the same task, has become essential for many applications aiming to achieve generalization and domain transfer from simulation to the real world [1,2]. While such a strategy is helpful with generalization, the use of multiple scenes significantly increases the variance of samples collected for policy gradient computations. Current methods continue to view this collection of scenes as a single Markov Decision Process (MDP) with a common value function; however, we argue that it is better to treat the collection as a single environment with multiple underlying MDPs. To this end, we propose a dynamic value estimation (DVE) technique for these multiple-MDP environments, motivated by the clustering effect observed in the value function distribution across different scenes. The resulting agent is able to learn a more accurate and scene-specific value function estimate (and hence the advantage function), leading to a lower sample variance. Our proposed approach is simple to accommodate with several existing implementations (like PPO, A3C) and results in consistent improvements for a range of ProcGen environments and the AI2-THOR framework based visual navigation task.
Data-driven Efficient Solvers and Predictions of Conformational Transitions for Langevin Dynamics on Manifold in High Dimensions
Gao, Yuan, Liu, Jian-Guo, Wu, Nan
We work on dynamic problems with collected data $\{\mathsf{x}_i\}$ that distributed on a manifold $\mathcal{M}\subset\mathbb{R}^p$. Through the diffusion map, we first learn the reaction coordinates $\{\mathsf{y}_i\}\subset \mathcal{N}$ where $\mathcal{N}$ is a manifold isometrically embedded into an Euclidean space $\mathbb{R}^\ell$ for $\ell \ll p$. The reaction coordinates enable us to obtain an efficient approximation for the dynamics described by a Fokker-Planck equation on the manifold $\mathcal{N}$. By using the reaction coordinates, we propose an implementable, unconditionally stable, data-driven upwind scheme which automatically incorporates the manifold structure of $\mathcal{N}$. Furthermore, we provide a weighted $L^2$ convergence analysis of the upwind scheme to the Fokker-Planck equation. The proposed upwind scheme leads to a Markov chain with transition probability between the nearest neighbor points. We can benefit from such property to directly conduct manifold-related computations such as finding the optimal coarse-grained network and the minimal energy path that represents chemical reactions or conformational changes. To establish the Fokker-Planck equation, we need to acquire information about the equilibrium potential of the physical system on $\mathcal{N}$. Hence, we apply a Gaussian Process regression algorithm to generate equilibrium potential for a new physical system with new parameters. Combining with the proposed upwind scheme, we can calculate the trajectory of the Fokker-Planck equation on $\mathcal{N}$ based on the generated equilibrium potential. Finally, we develop an algorithm to pullback the trajectory to the original high dimensional space as a generative data for the new physical system.
Effective Learning of a GMRF Mixture Model
Finder, Shahaf E., Treister, Eran, Freifeld, Oren
Learning a Gaussian Mixture Model (GMM) is hard when the number of parameters is too large given the amount of available data. As a remedy, we propose restricting the GMM to a Gaussian Markov Random Field Mixture Model (GMRF-MM), as well as a new method for estimating the latter's sparse precision (i.e., inverse covariance) matrices. When the sparsity pattern of each matrix is known, we propose an efficient optimization method for the Maximum Likelihood Estimate (MLE) of that matrix. When it is unknown, we utilize the popular Graphical LASSO (GLASSO) to estimate that pattern. However, we show that even for a single Gaussian, when GLASSO is tuned to successfully estimate the sparsity pattern, it does so at the price of a substantial bias of the values of the nonzero entries of the matrix, and we show that this problem only worsens in a mixture setting. To overcome this, we discard the non-zero values estimated by GLASSO, keep only its pattern estimate and use it within the proposed MLE method. This yields an effective two-step procedure that removes the bias. We show that our "debiasing" approach outperforms GLASSO in both the single-GMRF and the GMRF-MM cases. We also show that when learning priors for image patches, our method outperforms GLASSO even if we merely use an educated guess about the sparsity pattern, and that our GMRF-MM outperforms the baseline GMM on real and synthetic high-dimensional datasets. Our code is available at \url{https://github.com/shahaffind/GMRF-MM}.
Hidden Markov Models and their Application for Predicting Failure Events
We show how Markov mixed membership models (MMMM) can be used to predict the degradation of assets. We model the degradation path of individual assets, to predict overall failure rates. Instead of a separate distribution for each hidden state, we use hierarchical mixtures of distributions in the exponential family. In our approach the observation distribution of the states is a finite mixture distribution of a small set of (simpler) distributions shared across all states. Using tied-mixture observation distributions offers several advantages. The mixtures act as a regularization for typically very sparse problems, and they reduce the computational effort for the learning algorithm since there are fewer distributions to be found. Using shared mixtures enables sharing of statistical strength between the Markov states and thus transfer learning. We determine for individual assets the trade-off between the risk of failure and extended operating hours by combining a MMMM with a partially observable Markov decision process (POMDP) to dynamically optimize the policy for when and how to maintain the asset.
#111 Machine Learning with TensorFlow with Chris Mattmann – Author / Manager, Chief Technology and Innovation Officer -- DATA FUTUROLOGY PODCAST
Chris Mattmann is the Deputy Chief Technology and Innovation Officer at NASA Jet Propulsion Lab, where he has been recognised as JPL's first Principal Scientist in the area of Data Science. Chris has applied TensorFlow to challenges he's faced at NASA, including building an implementation of Google's Show & Tell algorithm for image captioning using TensorFlow. He was involved in the Mars rover landing mission, where he was working in a planetary data system engineering node, helping to build a data management framework called object-oriented data technology to support capturing, processing and sharing of data for NASA's scientific archives. He contributes to open source as a former Director at the Apache Software Foundation, and teaches graduate courses at USC in Content Detection and Analysis, and in Search Engines and Information Retrieval. In this episode, Chris opens the show discussing his interest in data.
Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption
Luo, Hongyin, Li, Shang-Wen, Glass, James
Spoken dialog systems have seen applications in many domains, including medical for automatic conversational diagnosis. State-of-the-art dialog managers are usually driven by deep reinforcement learning models, such as deep Q networks (DQNs), which learn by interacting with a simulator to explore the entire action space since real conversations are limited. However, the DQN-based automatic diagnosis models do not achieve satisfying performances when adapted to new, unseen diseases with only a few training samples. In this work, we propose the Prototypical Q Networks (ProtoQN) as the dialog manager for the automatic diagnosis systems. The model calculates prototype embeddings with real conversations between doctors and patients, learning from them and simulator-augmented dialogs more efficiently. We create both supervised and few-shot learning tasks with the Muzhi corpus. Experiments showed that the ProtoQN significantly outperformed the baseline DQN model in both supervised and few-shot learning scenarios, and achieves state-of-the-art few-shot learning performances.
Safe Learning for Near Optimal Scheduling
Geeraerts, Gilles, Guha, Shibashis, Pérez, Guillermo A., Raskin, Jean-François
In this paper, we investigate the combination of synthesis techniques and learning techniques to obtain safe and near optimal schedulers for a preemptible task scheduling problem. We study both model-based learning techniques with PAC guarantees and model-free learning techniques based on shielded deep Q-learning. The new learning algorithms have been implemented to conduct experimental evaluations.
A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments
Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing and robotics. The real-world complications of many tasks arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlying assumption of stationary environment model is relaxed. This paper provides a survey of RL methods developed for handling dynamically varying environment models. The goal of methods not limited by the stationarity assumption is to help autonomous agents adapt to varying operating conditions. This is possible either by minimizing the rewards lost during learning by RL agent or by finding a suitable policy for the RL agent which leads to efficient operation of the underlying system. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits. Additionally we also review works which are tailored to application domains. Finally, we discuss future enhancements for this field.
Applications of Probabilistic Programming (Master's thesis, 2015)
This thesis describes work on two applications of probabilistic programming: the learning of probabilistic program code given specifications, in particular program code of one-dimensional samplers; and the facilitation of sequential Monte Carlo inference with help of data-driven proposals. The latter is presented with experimental results on a linear Gaussian model and a non-parametric dependent Dirichlet process mixture of objects model for object recognition and tracking. In Chapter 1 we provide a brief introduction to probabilistic programming. In Chapter 2 we present an approach to automatic discovery of samplers in the form of probabilistic programs. We formulate a Bayesian approach to this problem by specifying a grammar-based prior over probabilistic program code. We use an approximate Bayesian computation method to learn the programs, whose executions generate samples that statistically match observed data or analytical characteristics of distributions of interest. In our experiments we leverage different probabilistic programming systems to perform Markov chain Monte Carlo sampling over the space of programs. Experimental results have demonstrated that, using the proposed methodology, we can learn approximate and even some exact samplers. Finally, we show that our results are competitive with regard to genetic programming methods. In Chapter 3, we describe a way to facilitate sequential Monte Carlo inference in probabilistic programming using data-driven proposals. In particular, we develop a distance-based proposal for the non-parametric dependent Dirichlet process mixture of objects model. We implement this approach in the probabilistic programming system Anglican, and show that for that model data-driven proposals provide significant performance improvements. We also explore the possibility of using neural networks to improve data-driven proposals.
Bridging the Gap Between Probabilistic Model Checking and Probabilistic Planning: Survey, Compilations, and Empirical Comparison
Klauck, Michaela (Saarland University, Saarland Informatics Campus) | Steinmetz, Marcel (Saarland University, CISPA Helmholtz Center for Information Security, Saarland Informatics Campus) | Hoffmann, Jörg (Saarland University, Saarland Informatics Campus) | Hermanns, Holger (Saarland University, Saarland Informatics Campus)
Markov decision processes are of major interest in the planning community as well as in the model checking community. But in spite of the similarity in the considered formal models, the development of new techniques and methods happened largely independently in both communities. This work is intended as a beginning to unite the two research branches. We consider goal-reachability analysis as a common basis between both communities. The core of this paper is the translation from Jani, an overarching input language for quantitative model checkers, into the probabilistic planning domain definition language (PPDDL), and vice versa from PPDDL into Jani. These translations allow the creation of an overarching benchmark collection, including existing case studies from the model checking community, as well as benchmarks from the international probabilistic planning competitions (IPPC). We use this benchmark set as a basis for an extensive empirical comparison of various approaches from the model checking community, variants of value iteration, and MDP heuristic search algorithms developed by the AI planning community. On a per benchmark domain basis, techniques from one community can achieve state-ofthe-art performance in benchmarks of the other community. Across all benchmark domains of one community, the performance comparison is however in favor of the solvers and algorithms of that particular community. Reasons are the design of the benchmarks, as well as tool-related limitations. Our translation methods and benchmark collection foster crossfertilization between both communities, pointing out specific opportunities for widening the scope of solvers to different kinds of models, as well as for exchanging and adopting algorithms across communities.