Bayesian Inference
Continual Learning Using Bayesian Neural Networks
Li, HongLin, Barnaghi, Payam, Enshaeifar, Shirin, Ganz, Frieder
Continual learning models allow to learn and adapt to new changes and tasks over time. However, in continual and sequential learning scenarios in which the models are trained using different data with various distributions, neural networks tend to forget the previously learned knowledge. This phenomenon is often referred to as catastrophic forgetting. The catastrophic forgetting is an inevitable problem in continual learning models for dynamic environments. To address this issue, we propose a method, called Continual Bayesian Learning Networks (CBLN), which enables the networks to allocate additional resources to adapt to new tasks without forgetting the previously learned tasks. Using a Bayesian Neural Network, CBLN maintains a mixture of Gaussian posterior distributions that are associated with different tasks. The proposed method tries to optimise the number of resources that are needed to learn each task and avoids an exponential increase in the number of resources that are involved in learning multiple tasks. The proposed method does not need to access the past training data and can choose suitable weights to classify the data points during the test time automatically based on an uncertainty criterion. We have evaluated our method on the MNIST and UCR time-series datasets. The evaluation results show that our method can address the catastrophic forgetting problem at a promising rate compared to the state-of-the-art models.
Kernels over Sets of Finite Sets using RKHS Embeddings, with Application to Bayesian (Combinatorial) Optimization
Buathong, Poompol, Ginsbourger, David, Krityakierne, Tipaluck
We focus on kernel methods for set-valued inputs and their application to Bayesian set optimization, notably combinatorial optimization. We introduce a class of (strictly) positive definite kernels that relies on Reproducing Kernel Hilbert Space embeddings, and successfully generalizes "double sum" set kernels recently considered in Bayesian set optimization, which turn out to be unsuitable for combinatorial optimization. The proposed class of kernels, for which we provide theoretical guarantees, essentially consists in applying an outer kernel on top of the canonical distance induced by a double sum kernel. Proofs of theoretical results about considered kernels are complemented by a few practicalities regarding hyperparameter fitting. We furthermore demonstrate the applicability of our approach in prediction and optimization tasks, relying both on toy examples and on two test cases from mechanical engineering and hydrogeology, respectively. Experimental results illustrate the added value of the approach and open new perspectives in prediction and sequential design with set inputs.
Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks
von Kügelgen, Julius, Rubenstein, Paul K, Schölkopf, Bernhard, Weller, Adrian
We study the problem of causal discovery through targeted interventions. Starting from few observational measurements, we follow a Bayesian active learning approach to perform those experiments which, in expectation with respect to the current model, are maximally informative about the underlying causal structure. Unlike previous work, we consider the setting of continuous random variables with non-linear functional relationships, modelled with Gaussian process priors. To address the arising problem of choosing from an uncountable set of possible interventions, we propose to use Bayesian optimisation to efficiently maximise a Monte Carlo estimate of the expected information gain.
NGBoost: Natural Gradient Boosting for Probabilistic Prediction
Duan, Tony, Avati, Anand, Ding, Daisy Yi, Basu, Sanjay, Ng, Andrew Y., Schuler, Alejandro
We present Natural Gradient Boosting (NGBoost), an algorithm which brings probabilistic prediction capability to gradient boosting in a generic way. Predictive uncertainty estimation is crucial in many applications such as healthcare and weather forecasting. Probabilistic prediction, which is the approach where the model outputs a full probability distribution over the entire outcome space, is a natural way to quantify those uncertainties. Gradient Boosting Machines have been widely successful in prediction tasks on structured input data, but a simple boosting solution for probabilistic prediction of real valued outputs is yet to be made. NGBoost is a gradient boosting approach which uses the \emph{Natural Gradient} to address technical challenges that makes generic probabilistic prediction hard with existing gradient boosting methods. Our approach is modular with respect to the choice of base learner, probability distribution, and scoring rule. We show empirically on several regression datasets that NGBoost provides competitive predictive performance of both uncertainty estimates and traditional metrics.
How to Implement Bayesian Optimization from Scratch in Python
Many methods exist for function optimization, such as randomly sampling the variable search space, called random search, or systematically evaluating samples in a grid across the search space, called grid search. More principled methods are able to learn from sampling the space so that future samples are directed toward the parts of the search space that are most likely to contain the extrema. A directed approach to global optimization that uses probability is called Bayesian Optimization. Take my free 7-day email crash course now (with sample code). Click to sign-up and also get a free PDF Ebook version of the course.
DeepMind Researchers Develop Tools To Visualise ML Unfairness
Machine Learning engineers work around bias or the offsets in a model by drawing insights from the output, gauging the losses, going through tonnes of data and repeating till agreeable results have been obtained. This is a traditional process which takes time but works decently. An alternative to this approach is the Lagrangian approach, a mathematical method to find the local maxima and local minima of a function when provided with equality constraints. This too, comes with its own set of complexities. The unfairness of machine learning algorithms was exposed when they were deployed for manual tasks like hiring, surveillance and other such critical tasks, where the damages can be irreversible.
bayespy/bayespy
BayesPy provides tools for Bayesian inference with Python. The user constructs a model as a Bayesian network, observes data and runs posterior inference. The goal is to provide a tool which is efficient, flexible and extendable enough for expert use but also accessible for more casual users. Currently, only variational Bayesian inference for conjugate-exponential family (variational message passing) has been implemented. Future work includes variational approximations for other types of distributions and possibly other approximate inference methods such as expectation propagation, Laplace approximations, Markov chain Monte Carlo (MCMC) and other methods. BayesPy including the documentation is licensed under the MIT License.
Stochastic Triangular Mesh Mapping
Lombard, Clint D., van Daalen, Corné E.
For mobile robots to operate autonomously in general environments, perception is required in the form of a dense metric map. For this purpose, we present the stochastic triangular mesh (STM) mapping technique: a 2.5-D representation of the surface of the environment using a continuous mesh of triangular surface elements, where each surface element models the mean plane and roughness of the underlying surface. In contrast to existing mapping techniques, a STM map models the structure of the environment by ensuring a continuous model, while also being able to be incrementally updated with linear computational cost in the number of measurements. We reduce the effect of uncertainty in the robot pose (position and orientation) by using landmark-relative submaps. The uncertainty in the measurements and robot pose are accounted for by the use of Bayesian inference techniques during the map update. We demonstrate that a STM map can be used with sensors that generate point measurements, such as light detection and ranging (LiDAR) sensors and stereo cameras. We show that a STM map is a more accurate model than the only comparable online surface mapping technique$\unicode{x2014}$a standard elevation map$\unicode{x2014}$and we also provide qualitative results on practical datasets.
Distilling importance sampling
The two main approaches to Bayesian inference are sampling and optimisation methods. However many complicated posteriors are difficult to approximate by either. Therefore we propose a novel approach combining features of both. We use a flexible parameterised family of densities, such as a normalising flow. Given a density from this family approximating the posterior we use importance sampling to produce a weighted sample from a more accurate posterior approximation. This sample is then used in optimisation to update the parameters of the approximate density, a process we refer to as "distilling" the importance sampling results. We illustrate our method in a queueing model example.
How to Develop a Naive Bayes Classifier from Scratch in Python
Classification is a predictive modeling problem that involves assigning a label to a given input data sample. The problem of classification predictive modeling can be framed as calculating the conditional probability of a class label given a data sample. Bayes Theorem provides a principled way for calculating this conditional probability, although in practice requires an enormous number of samples (very large-sized dataset) and is computationally expensive. Instead, the calculation of Bayes Theorem can be simplified by making some assumptions, such as each input variable is independent of all other input variables. Although a dramatic and unrealistic assumption, this has the effect of making the calculations of the conditional probability tractable and results in an effective classification model referred to as Naive Bayes.