Undirected Networks
Blind Spot Detection for Safe Sim-to-Real Transfer
Ramakrishnan, Ramya (Massachusetts Institute of Technology) | Kamar, Ece | Dey, Debadeepta | Horvitz, Eric | Shah, Julie
Agents trained in simulation may make errors when performing actions in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult for the agent to discover because the agent is unable to predict them a priori. In this work, we propose the use of oracle feedback to learn a predictive model of these blind spots in order to reduce costly errors in real-world applications. We focus on blind spots in reinforcement learning (RL) that occur due to incomplete state representation: when the agent lacks necessary features to represent the true state of the world, and thus cannot distinguish between numerous states. We formalize the problem of discovering blind spots in RL as a noisy supervised learning problem with class imbalance. Our system learns models for predicting blind spots within unseen regions of the state space by combining techniques for label aggregation, calibration, and supervised learning. These models take into consideration noise emerging from different forms of oracle feedback, including demonstrations and corrections. We evaluate our approach across two domains and demonstrate that it achieves higher predictive performance than baseline methods, and also that the learned model can be used to selectively query an oracle at execution time to prevent errors. We also empirically analyze the biases of various feedback types and how these biases influence the discovery of blind spots. Further, we include analyses of our approach that incorporate relaxed initial optimality assumptions. (Interestingly, relaxing the assumptions of an optimal oracle and an optimal simulator policy helped our models to perform better.) We also propose extensions to our method that are intended to improve performance when using corrections and demonstrations data.
Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise
Kaledin, Maxim, Moulines, Eric, Naumov, Alexey, Tadic, Vladislav, Wai, Hoi-To
Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is o (1 /k c) and the steady-state term is O (1 /k), where c 1 and k is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of โฆ(1 /k). A simple numerical experiment is presented to support our theory. Keywords: stochastic approximation, reinforcement learning, GTD learning, Markovian noise 1. Introduction Since its introduction close to 70 years ago, the stochastic approximation (SA) scheme (Robbins and Monro, 1951) has been a powerful tool for root finding when only noisy samples are available. During the past two decades, considerable progresses in the practical and theoretical research of SA have been made, see (Bena ฤฑm, 1999; Kushner and Yin, 2003; Borkar, 2008) for an overview. Among others, linear SA schemes are popular in reinforcement learning (RL) as they lead to policy evaluation methods with linear function approximation, of particular importance is temporal difference (TD) learning (Sutton, 1988) for which finite time analysis has been reported in (Srikant and Ying, 2019; Lakshminarayanan and Szepesvari, 2018; Bhandari et al., 2018; Dalal et al., 2018a). The TD learning scheme based on classical (linear) SA is known to be inadequate for the off-policy learning paradigms in RL, where data samples are drawn from a behavior policy different from the policy being evaluated (Baird, 1995; Tsitsiklis and V an Roy, 1997). To circumvent this Authors listed in alphabetical order. These methods fall within the scope of linear two-timescale SA scheme introduced by Borkar (1997): ฮธ k 1 ฮธ k ฮฒ k{null b 1( X k 1) null A 11(X k 1)ฮธ k null A 12(X k 1) w k}, (1) w k 1 w k ฮณ k{null b 2( X k 1) null A 21( X k 1)ฮธ k null A 22(X k 1)w k}.
tfp.mcmc: Modern Markov Chain Monte Carlo Tools Built for Modern Hardware
Lao, Junpeng, Suter, Christopher, Langmore, Ian, Chimisov, Cyril, Saxena, Ashish, Sountsov, Pavel, Moore, Dave, Saurous, Rif A., Hoffman, Matthew D., Dillon, Joshua V.
Markov chain Monte Carlo (MCMC) is widely regarded as one of the most important algorithms of the 20th century. Its guarantees of asymptotic convergence, stability, and estimator-variance bounds using only unnormalized probability functions make it indispensable to probabilistic programming. In this paper, we introduce the TensorFlow Probability MCMC toolkit, and discuss some of the considerations that motivated its design.
DALC: Distributed Automatic LSTM Customization for Fine-Grained Traffic Speed Prediction
Lee, Ming-Chang, Lin, Jia-Chun
Over the past decade, several approaches have been introduced for short - term traffic prediction. However, providing fine - grained traffic prediction for large - scale transportation networks where numerous detectors are geographically deployed to collect traf fic data is still an open issue. To address this issue, in this paper, we formulate the problem of customizing an LSTM model for a single detector into a finite Markov decision process and then introduce an A utomatic L STM C ustomization (ALC) algorithm to a utomatically customize an LSTM model for a single detector such that the corresponding prediction accuracy can be as satisfactory as possible and the time consumption can be as low as possible. Based on the ALC algorithm, we introduce a distributed approac h called D istributed A utomatic L STM C ustomization (DALC) to customize an LSTM model for every detector in large - scale transportation networks. Our experiment demonstrate s that the DALC provides higher prediction accuracy than several approaches provided by Apache Spark MLlib.
Effectively Trainable Semi-Quantum Restricted Boltzmann Machine
Lyakhova, Ya. S., Polyakov, E. A., Rubtsov, A. N.
We propose a novel quantum model for the restricted Boltzmann machine (RBM), in which the visible units remain classical whereas the hidden units are quantized as noninteracting fermions. The free motion of the fermions is parametrically coupled to the classical signal of the visible units. This model possesses a quantum behaviour such as coherences between the hidden units. Numerical experiments show that this fact makes it more powerful than the classical RBM with the same number of hidden units. At the same time, a significant advantage of the proposed model over the other approaches to the Quantum Boltzmann Machine (QBM) is that it is exactly solvable and efficiently trainable on a classical computer: there is a closed expression for the log-likelihood gradient with respect to its parameters. This fact makes it interesting not only as a model of a hypothetical quantum simulator, but also as a quantum-inspired classical machine-learning algorithm.
Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks
Walsh, Jonathan R., Smith, Aaron M., Pouliot, Yannick, Li-Bland, David, Loukianov, Anton, Fisher, Charles K.
Multiple Sclerosis (MS) is a neurodegenerative disorder characterized by a complex set of clinical assessments. We use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to learn the relationships between covariates commonly used to characterize subjects and their disease progression in MS clinical trials. A CRBM is capable of generating digital twins, which are simulated subjects having the same baseline data as actual subjects. Digital twins allow for subject-level statistical analyses of disease progression. The CRBM is trained using data from 2395 subjects enrolled in the placebo arms of clinical trials across the three primary subtypes of MS. We discuss how CRBMs are trained and show that digital twins generated by the model are statistically indistinguishable from their actual subject counterparts along a number of measures.
Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections
Bobu, Andreea, Bajcsy, Andrea, Fisac, Jaime F., Deglurkar, Sampada, Dragan, Anca D.
Human input has enabled autonomous systems to improve their capabilities and achieve complex behaviors that are otherwise challenging to generate automatically. Recent work focuses on how robots can use such input - like demonstrations or corrections - to learn intended objectives. These techniques assume that the human's desired objective already exists within the robot's hypothesis space. In reality, this assumption is often inaccurate: there will always be situations where the person might care about aspects of the task that the robot does not know about. Without this knowledge, the robot cannot infer the correct objective. Hence, when the robot's hypothesis space is misspecified, even methods that keep track of uncertainty over the objective fail because they reason about which hypothesis might be correct, and not whether any of the hypotheses are correct. In this paper, we posit that the robot should reason explicitly about how well it can explain human inputs given its hypothesis space and use that situational confidence to inform how it should incorporate human input. We demonstrate our method on a 7 degree-of-freedom robot manipulator in learning from two important types of human input: demonstrations of manipulation tasks, and physical corrections during the robot's task execution.
Torch-Struct: Deep Structured Prediction Library
The literature on structured prediction for NLP describes a rich collection of distributions and algorithms over sequences, segmentations, alignments, and trees; however, these algorithms are difficult to utilize in deep learning frameworks. We introduce Torch-Struct, a library for structured prediction designed to take advantage of and integrate with vectorized, auto-differentiation based frameworks. Torch-Struct includes a broad collection of probabilistic structures accessed through a simple and flexible distribution-based API that connects to any deep learning model. The library utilizes batched, vectorized operations and exploits auto-differentiation to produce readable, fast, and testable code. Internally, we also include a number of general-purpose optimizations to provide cross-algorithm efficiency. Experiments show significant performance gains over fast baselines and case-studies demonstrate the benefits of the library.
Automatic structured variational inference
Ambrogioni, Luca, Hinne, Max, van Gerven, Marcel
The aim of probabilistic programming is to automatize every aspect of probabilistic inference in arbitrary probabilistic models (programs) so that the user can focus her attention on modeling, without dealing with ad-hoc inference methods. Gradient based automatic differentiation stochastic variational inference offers an attractive option as the default method for (differentiable) probabilistic programming as it combines high performance with high computational efficiency. However, the performance of any (parametric) variational approach depends on the choice of an appropriate variational family. Here, we introduced a fully automatic method for constructing structured variational families inspired to the closed-form update in conjugate models. These pseudo-conjugate families incorporate the forward pass of the input probabilistic program and can capture complex statistical dependencies. Pseudo-conjugate families have the same space and time complexity of the input probabilistic program and are therefore tractable in a very large class of models. We validate our automatic variational method on a wide range of high dimensional inference problems including deep learning components.
Deep Learning (Interview With Dong Yu)
Dr. Dong Yu is a principal researcher at Microsoft Research. His research has been focusing on speech recognition and applications of machine learning techniques. He has published two monographs and over 150 papers in these areas and is the inventor/co-inventor of near 60 granted/pending patents. His recent work on the context-dependent deep neural network hidden Markov model (CD-DNN-HMM), which was recognized by the IEEE SPS 2013 best paper award, caused a paradigm shift on large vocabulary speech recognition. Dr. Dong Yu is currently serving as a member of the IEEE Speech and Language Processing Technical Committee (2013-).