Directed Networks
Deep Bayesian Uncertainty Estimation for Adaptation and Self-Annotation of Food Packaging Images
Ribeiro, Fabio De Sousa, Caliva, Francesco, Swainson, Mark, Gudmundsson, Kjartan, Leontidis, Georgios, Kollias, Stefanos
Food packaging labels provide important information for public health, such as allergens and use-by dates. Off-the-shelf Optical Character Verification (OCV) systems are good solutions for automating food label quality assessments, but are known to under perform on complex data. This paper proposes a Deep Learning based system that can identify inadequate images for OCV, due to their poor label quality, by employing state-of-the-art Convolutional Neural Network (CNN) architectures, and practical Bayesian inference techniques for automatic self-annotation. We propose a practical domain adaptation procedure based on k-means clustering of CNN latent variables, followed by a k-Nearest Neighbour classification for handling high label variability between different dataset distributions. Moreover, Supervised Learning has proven useful in such systems but manual annotation of large amounts of data is usually required. This is practically intractable in most real world problems due to time/labour constraints. In an attempt to address this issue, we introduce a self-annotating prediction model based on Self-Training of a Bayesian CNN, that leverages modern variational inference methods of deep models. In this context, we propose a new inverse uncertainty weighting technique that encourages the Self-Training model to learn from more informative data over time, potentially preventing it from becoming lazy by only selecting easy examples to learn from. An experimental study is presented illustrating the superior performance of the proposed approach over standard Self-Training, and highlighting the importance of predictive uncertainty estimates in safety-critical domains.
What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning
Li, Irene, Fabbri, Alexander R., Tung, Robert R., Radev, Dragomir R.
Recent years have witnessed the rising popularity of Natural Language Processing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question of "what should one learn first," we apply an embedding-based method to learn prerequisite relations for course concepts in the domain of NLP. We introduce LectureBank, a dataset containing 1,352 English lecture files collected from university courses which are each classified according to an existing taxonomy as well as 208 manually-labeled prerequisite relation topics, which is publicly available. The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation. Additionally, we experiment with neural graph-based networks and non-neural classifiers to learn these prerequisite relations from our dataset.
A Model-Based Reinforcement Learning Approach for a Rare Disease Diagnostic Task
Besson, Rémi, Pennec, Erwan Le, Allassonnière, Stéphanie, Stirnemann, Julien, Spaggiari, Emmanuel, Neuraz, Antoine
In this work, we present our various contributions to the objective of building a decision support tool for the diagnosis of rare diseases. Our goal is to achieve a state of knowledge where the uncertainty about the patient's disease is below a predetermined threshold. We aim to reach such states while minimizing the average number of medical tests to perform. In doing so, we take into account the need, in many medical applications, to avoid, as much as possible, any misdiagnosis. To solve this optimization task, we investigate several reinforcement learning algorithm and make them operable in our high-dimensional and sparse-reward setting. We also present a way to combine expert knowledge, expressed as conditional probabilities, with real clinical data. This is crucial because the scarcity of data in the field of rare diseases prevents any approach based solely on clinical data. Finally we show that it is possible to integrate the ontological information about symptoms while remaining in our probabilistic reasoning. It enables our decision support tool to process information given at different level of precision by the user.
Connecting the Dots Between MLE and RL for Sequence Generation
Tan, Bowen, Hu, Zhiting, Yang, Zichao, Salakhutdinov, Ruslan, Xing, Eric
Sequence generation models such as recurrent networks can be trained with a diverse set of learning algorithms. For example, maximum likelihood learning is simple and efficient, yet suffers from the exposure bias problem. Reinforcement learning like policy gradient addresses the problem but can have prohibitively poor exploration efficiency. A variety of other algorithms such as RAML, SPG, and data noising, have also been developed from different perspectives. This paper establishes a formal connection between these algorithms. We present a generalized entropy regularized policy optimization formulation, and show that the apparently divergent algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of reward function and a couple of hyperparameters. The unified interpretation offers a systematic view of the varying properties of exploration and learning efficiency. Besides, based on the framework, we present a new algorithm that dynamically interpolates among the existing algorithms for improved learning. Experiments on machine translation and text summarization demonstrate the superiority of the proposed algorithm.
Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior
Wang, Zi, Kim, Beomjoon, Kaelbling, Leslie Pack
Bayesian optimization usually assumes that a Bayesian prior is given. However, the strong theoretical guarantees in Bayesian optimization are often regrettably compromised in practice because of unknown parameters in the prior. In this paper, we adopt a variant of empirical Bayes and show that, by estimating the Gaussian process prior from offline data sampled from the same prior and constructing unbiased estimators of the posterior, variants of both GP-UCB and probability of improvement achieve a near-zero regret bound, which decreases to a constant proportional to the observational noise as the number of offline data and the number of online evaluations increase. Empirically, we have verified our approach on challenging simulated robotic problems featuring task and motion planning.
Surrogate-assisted parallel tempering for Bayesian neural learning
Chandra, Rohitash, Jain, Konark, Kapoor, Arpit
Parallel tempering addresses some of the drawbacks of canonical Markov Chain Monte-Carlo methods for Bayesian neural learning with the ability to utilize high performance computing. However, certain challenges remain given the large range of network parameters and big data. Surrogate-assisted optimization considers the estimation of an objective function for models given computational inefficiency or difficulty to obtain clear results. We address the inefficiency of parallel tempering for large-scale problems by combining parallel computing features with surrogate assisted estimation of likelihood function that describes the plausibility of a model parameter value, given specific observed data. In this paper, we present surrogate-assisted parallel tempering for Bayesian neural learning where the surrogates are used to estimate the likelihood. The estimation via the surrogate becomes useful rather than evaluating computationally expensive models that feature large number of parameters and datasets. Our results demonstrate that the methodology significantly lowers the computational cost while maintaining quality in decision making using Bayesian neural learning. The method has applications for a Bayesian inversion and uncertainty quantification for a broad range of numerical models.
Sequential Neural Methods for Likelihood-free Inference
Durkan, Conor, Papamakarios, George, Murray, Iain
Likelihood-free inference refers to inference when a likelihood function cannot be explicitly evaluated, which is often the case for models based on simulators. While much of the literature is concerned with sample-based'Approximate Bayesian Computation' methods, recent work suggests that approaches relying on deep neural conditional density estimators can obtain state-of-the-art results with fewer simulations. The neural approaches vary in how they choose which simulations to run and what they learn: an approximate posterior or a surrogate likelihood. This work provides some direct controlled comparisons between these choices.
Self-Adversarially Learned Bayesian Sampling
Zhao, Yang, Zhang, Jianyi, Chen, Changyou
Scalable Bayesian sampling is playing an important role in modern machine learning, especially in the fast-developed unsupervised-(deep)-learning models. While tremendous progresses have been achieved via scalable Bayesian sampling such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD), the generated samples are typically highly correlated. Moreover, their sample-generation processes are often criticized to be inefficient. In this paper, we propose a novel self-adversarial learning framework that automatically learns a conditional generator to mimic the behavior of a Markov kernel (transition kernel). High-quality samples can be efficiently generated by direct forward passes though a learned generator. Most importantly, the learning process adopts a self-learning paradigm, requiring no information on existing Markov kernels, e.g., knowledge of how to draw samples from them. Specifically, our framework learns to use current samples, either from the generator or pre-provided training data, to update the generator such that the generated samples progressively approach a target distribution, thus it is called self-learning. Experiments on both synthetic and real datasets verify advantages of our framework, outperforming related methods in terms of both sampling efficiency and sample quality.
A Bayesian Approach to Time Series Forecasting – Towards Data Science
Today we are going to implement a Bayesian linear regression in R from scratch and use it to forecast US GDP growth. This post is based on a very informative manual from the Bank of England on Applied Bayesian Econometrics. I have translated the original Matlab code into R since its open source and widely used in data analysis/science. My main goal in this post is to try and give people a better understanding of Bayesian statistics, some of it's advantages and also some scenarios where you might want to use it. Let's take a moment to think about why we would we even want to use Bayesian techniques in the first place.
Cooperative Localisation of a GPS-Denied UAV using Direction of Arrival Measurements
Russell, James S., Ye, Mengbin, Anderson, Brian D. O., Hmam, Hatem, Sarunic, Peter
A GPS-denied UAV (Agent B) is localised through INS alignment with the aid of a nearby GPS-equipped UAV (Agent A), which broadcasts its position at several time instants. Agent B measures the signals' direction of arrival with respect to Agent B's inertial navigation frame. Semidefinite programming and the Orthogonal Procrustes algorithm are employed, and accuracy is improved through maximum likelihood estimation. The method is validated using flight data and simulations. A three-agent extension is explored.