Undirected Networks
Faster saddle-point optimization for solving large-scale Markov decision processes
Bas-Serrano, Joan, Neu, Gergely
We consider the problem of computing optimal policies in average-r eward Markov decision processes. This classical problem can be formulated as a linear program dire ctly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the n umber of states. T o address this issue, recent work has considered a linearly relaxed version of the res ulting saddle-point problem. Our work aims at achieving a better understanding of this relaxed optimization pro blem by characterizing the conditions necessary for convergence to the optimal policy, and designing a n optimization algorithm enjoying fast convergence rates that are independent of the size of the state s pace.
Efficient Decision Making and Belief Space Planning using Sparse Approximations
Elimelech, Khen, Indelman, Vadim
In this work, we introduce a new approach for the efficient solution of autonomous decision and planning problems, with a special focus on decision making under uncertainty and belief space planning (BSP) in high-dimensional state spaces. Usually, to solve the decision problem, we identify the optimal action, according to some objective function. Instead, we claim that we can sometimes generate and solve an analogous yet simplified decision problem, which can be solved more efficiently. Furthermore, a wise simplification method can lead to the same action selection, or one for which the maximal loss can be guaranteed. This simplification is separated from the state inference, and does not compromise its accuracy, as the selected action would finally be applied on the original state. At first, we develop the concept for general decision problems, and provide a theoretical framework of definitions to allow a coherent discussion. We then practically apply these ideas to BSP problems, in which the problem is simplified by considering a sparse approximation of the initial belief. The scalable sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a highly realistic active-SLAM problem, and manage to significantly reduce computation time, with practically no loss in the quality of solution. This rigorous and fundamental work is conceptually novel, and holds numerous possible extensions.
Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture
Ladosz, Pawel, Ben-Iwhiwhu, Eseoghene, Hu, Yang, Ketz, Nicholas, Kolouri, Soheil, Krichmar, Jeffrey L., Pilly, Praveen, Soltoggio, Andrea
This paper introduces the modulated Hebbian plus Q network architecture (MOHQA) for solving challenging partially observable Markov decision processes (POMDPs) deep reinforcement learning problems with sparse rewards and confounding observations. The proposed architecture combines a deep Q-network (DQN), and a modulated Hebbian network with neural eligibility traces (MOHN). Bio-inspired neural traces are used to bridge temporal delays between actions and rewards. The purpose is to discover distal cause-effect relationships where confounding observations and sparse rewards cause standard RL algorithms to fail. Each of the two modules of the network (DQN and MOHN) is responsible for different aspects of learning. DQN learns low level features and control, while MOHN contributes to the high-level decisions by bridging rewards with past actions. The strength of the approach is to support a DQN standard framework when temporal difference errors are difficult to compute due to non-observable states. The system is tested on a set of generalized decision making problems encoded as decision tree graphs that deliver delayed rewards after key decision points and confounding observations. The simulations show that the proposed approach helps solve problems that are currently challenging for state-of-the-art deep reinforcement learning algorithms.
Uncertainty Quantification with Statistical Guarantees in End-to-End Autonomous Driving Control
Michelmore, Rhiannon, Wicker, Matthew, Laurenti, Luca, Cardelli, Luca, Gal, Yarin, Kwiatkowska, Marta
Uncertainty Quantification with Statistical Guarantees in End-to-End Autonomous Driving Control Rhiannon Michelmore 1, Matthew Wicker 1, Luca Laurenti 1, Luca Cardelli 1, Y arin Gal 1, Marta Kwiatkowska 1 Abstract -- Deep neural network controllers for autonomous driving have recently benefited from significant performance improvements, and have begun deployment in the real world. Prior to their widespread adoption, safety guarantees are needed on the controller behaviour that properly take account of the uncertainty within the model as well as sensor noise. Bayesian neural networks, which assume a prior over the weights, have been shown capable of producing such uncertainty measures, but properties surrounding their safety have not yet been quantified for use in autonomous driving scenarios. In this paper, we develop a framework based on a state-of-the-art simulator for evaluating end-to-end Bayesian controllers. In addition to computing pointwise uncertainty measures that can be computed in real time and with statistical guarantees, we also provide a method for estimating the probability that, given a scenario, the controller keeps the car safe within a finite horizon. We experimentally evaluate the quality of uncertainty computation by three Bayesian inference methods in different scenarios and show how the uncertainty measures can be combined and calibrated for use in collision avoidance. Our results suggest that uncertainty estimates can greatly aid decision making in autonomous driving. I NTRODUCTION Deep Neural Networks (DNNs) have seen a surge in popularity over the past decade, and their use has become widespread in many fields including safety-critical systems such as medical diagnosis and, in particular, autonomous cars. The latter have driven millions of miles without human intervention [1], [2], but offer few safety guarantees.
Nonparametric learning for impulse control problems
Christensen, Sören, Strauch, Claudia
One of the fundamental assumptions in stochastic control of continuous time processes is that the dynamics of the underlying (diffusion) process is known. This is, however, usually obviously not fulfilled in practice. On the other hand, over the last decades, a rich theory for nonparametric estimation of the drift (and volatility) for continuous time processes has been developed. The aim of this paper is bringing together techniques from stochastic control with methods from statistics for stochastic processes to find a way to both learn the dynamics of the underlying process and control in a reasonable way at the same time. More precisely, we study a long-term average impulse control problem, a stochastic version of the classical Faustmann timber harvesting problem. One of the problems that immediately arises is an exploration vs. exploitation-behavior as is well known for problems in machine learning. We propose a way to deal with this issue by combining exploration- and exploitation periods in a suitable way. Our main finding is that this construction can be based on the rates of convergence of estimators for the invariant density. Using this, we obtain that the average cumulated regret is of uniform order $O({T^{-1/3}})$.
Non-Parametric Structure Learning on Hidden Tree-Shaped Distributions
Nikolakakis, Konstantinos E., Kalogerias, Dionysios S., Sarwate, Anand D.
We provide high probability sample complexity guarantees for non-parametric structure learning of tree-shaped graphical models whose nodes are discrete random variables with a finite or countable alphabet, both in the noiseless and noisy regimes. First, we introduce a new, fundamental quantity called the (noisy) information threshold, which arises naturally from the error analysis of the Chow-Liu algorithm and characterizes not only the sample complexity, but also the inherent impact of the noise on the structure learning task, without explicit assumptions on the distribution of the model. This allows us to present the first non-parametric, high-probability finite sample complexity bounds on tree-structure learning from potentially noise-corrupted data. In particular, for number of nodes $p$, success rate $1-\delta$, and a fixed value of the information threshold, our sample complexity bounds for exact structure recovery are of the order of $\mathcal{O}\big(\log^{1+\zeta} (p/\delta)\big) $, for all $\zeta>0$, for both noiseless and noisy settings. Subsequently, we apply our results on two classes of hidden models, namely, the $M$-ary erasure channel and the generalized symmetric channel, illustrating the usefulness and importance of our framework. As a byproduct of our analysis, this paper resolves the open problem of tree structure learning in the presence of non-identically distributed observation noise, providing explicit conditions on the convergence of the Chow-Liu algorithm under this setting as well.
Reconnaissance and Planning algorithm for constrained MDP
Maeda, Shin-ichi, Watahiki, Hayato, Okada, Shintarou, Koyama, Masanori
Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we propose a novel simulator-based method to approximately solve a CMDP problem without making any compromise on the safety constraints. We achieve this by decomposing the CMDP into a pair of MDPs; reconnaissance MDP and planning MDP. The purpose of reconnaissance MDP is to evaluate the set of actions that are safe, and the purpose of planning MDP is to maximize the return while using the actions authorized by reconnaissance MDP. RMDP can define a set of safe policies for any given set of safety constraint, and this set of safe policies can be used to solve another CMDP problem with different reward. Our method is not only computationally less demanding than the previous simulator-based approaches to CMDP, but also capable of finding a competitive reward-seeking policy in a high dimensional environment, including those involving multiple moving obstacles.
Extracting Conceptual Knowledge from Natural Language Text Using Maximum Likelihood Principle
Sharma, Shipra, Sodhi, Balwinder
Domain-specific knowledge graphs constructed from natural language text are ubiquitous in today's world. In many such scenarios the base text, from which the knowledge graph is constructed, concerns itself with practical, on-hand, actual or ground-reality information about the domain. Product documentation in software engineering domain are one example of such base texts. Other examples include blogs and texts related to digital artifacts, reports on emerging markets and business models, patient medical records, etc. Though the above sources contain a wealth of knowledge about their respective domains, the conceptual knowledge on which they are based is often missing or unclear. Access to this conceptual knowledge can enormously increase the utility of available data and assist in several tasks such as knowledge graph completion, grounding, querying, etc. Our contributions in this paper are twofold. First, we propose a novel Markovian stochastic model for document generation from conceptual knowledge. The uniqueness of our approach lies in the fact that the conceptual knowledge in the writer's mind forms a component of the parameter set of our stochastic model. Secondly, we solve the inverse problem of learning the best conceptual knowledge from a given document, by finding model parameters which maximize the likelihood of generating the specific document over all possible parameter values. This likelihood maximization is done using an application of Baum-Welch algorithm, which is a known special case of Expectation-Maximization (EM) algorithm. We run our conceptualization algorithm on several well-known natural language sources and obtain very encouraging results. The results of our extensive experiments concur with the hypothesis that the information contained in these sources has a well-defined and rigorous underlying conceptual structure, which can be discovered using our method.
Automated and Interpretable Patient ECG Profiles for Disease Detection, Tracking, and Discovery
The ECG remains the most widely used diagnostic test for characterization of cardiac structure and electrical activity. We hypothesized that parallel advances in computing power, machine learning algorithms, and availability of large-scale data could substantially expand the clinical inferences derived from the ECG while at the same time preserving interpretability for medical decision-making. We identified 36 186 ECGs from the University of California, San Francisco database that would enable training of models for estimation of cardiac structure or function or detection of disease. We segmented the ECG into standard component waveforms and intervals using a novel combination of convolutional neural networks and hidden Markov models and evaluated this segmentation by comparing resulting electrical intervals against 141 864 measurements produced during the clinical workflow. We then built a patient-level ECG profile, a 725-element feature vector and used this profile to train and interpret machine learning models for examples of cardiac structure (left ventricular mass, left atrial volume, and mitral annulus e-prime) and disease (pulmonary arterial hypertension, hypertrophic cardiomyopathy, cardiac amyloid, and mitral valve prolapse).
Unsupervised Segmentation of Fire and Smoke from Infra-Red Videos
Ajith, Meenu, Martínez-Ramón, Manel
This paper proposes a vision-based fire and smoke segmentation system which use spatial, temporal and motion information to extract the desired regions from the video frames. The fusion of information is done using multiple features such as optical flow, divergence and intensity values. These features extracted from the images are used to segment the pixels into different classes in an unsupervised way. A comparative analysis is done by using multiple clustering algorithms for segmentation. Here the Markov Random Field performs more accurately than other segmentation algorithms since it characterizes the spatial interactions of pixels using a finite number of parameters. It builds a probabilistic image model that selects the most likely labeling using the maximum a posteriori (MAP) estimation. This unsupervised approach is tested on various images and achieves a frame-wise fire detection rate of 95.39%. Hence this method can be used for early detection of fire in real-time and it can be incorporated into an indoor or outdoor surveillance system.