Goto

Collaborating Authors

 Bayesian Inference


Trust Region Value Optimization using Kalman Filtering

arXiv.org Machine Learning

Policy evaluation is a key process in reinforcement learning. It assesses a given policy using estimation of the corresponding value function. When using a parameterized function to approximate the value, it is common to optimize the set of parameters by minimizing the sum of squared Bellman Temporal Differences errors. However, this approach ignores certain distributional properties of both the errors and value parameters. Taking these distributions into account in the optimization process can provide useful information on the amount of confidence in value estimation. In this work we propose to optimize the value by minimizing a regularized objective function which forms a trust region over its parameters. We present a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter. KOVA minimizes the regularized objective function by adopting a Bayesian perspective over both the value parameters and noisy observed returns. This distributional property provides information on parameter uncertainty in addition to value estimates. We provide theoretical results of our approach and analyze the performance of our proposed optimizer on domains with large state and action spaces.


Nonparametric Bayesian Deep Networks with Local Competition

arXiv.org Machine Learning

Local competition among neighboring neurons is a common procedure taking place in biological systems. This finding has inspired research on more biologically plausible deep networks that comprise competing linear units, as opposed to nonlinear units that do not entail any form of (local) competition. This paper revisits this modeling paradigm, with the aim of enabling inference of networks that retain state-of-the-art accuracy for the least possible model complexity; this includes the needed number of connections or locally competing sets of units, as well as the required floating-point precision for storing the network weights. To this end, we leverage solid arguments from the field of Bayesian nonparametrics. Specifically, we introduce auxiliary discrete latent variables of model component utility, and perform Bayesian inference over them. Then, we impose appropriate stick-breaking priors over the introduced discrete latent variables; these give rise to a well-established sparsity-inducing mechanism. As we experimentally show using benchmark datasets, our approach yields networks with less memory footprint than the state-of-the-art, and with no compromises in predictive accuracy.


Neutron drip line in the Ca region from Bayesian model averaging

arXiv.org Machine Learning

The region of heavy calcium isotopes forms the frontier of experimental and theoretical nuclear structure research where the basic concepts of nuclear physics are put to stringent test. The recent discovery of the extremely neutron-rich nuclei around $^{60}$Ca [Tarasov, 2018] and the experimental determination of masses for $^{55-57}$Ca (Michimasa, 2018] provide unique information about the binding energy surface in this region. To assess the impact of these experimental discoveries on the nuclear landscape's extent, we use global mass models and statistical machine learning to make predictions, with quantified levels of certainty, for bound nuclides between Si and Ti. Using a Bayesian model averaging analysis based on Gaussian-process-based extrapolations we introduce the posterior probability $p_{ex}$ for each nucleus to be bound to neutron emission. We find that extrapolations for drip-line locations, at which the nuclear binding ends, are consistent across the global mass models used, in spite of significant variations between their raw predictions. In particular, considering the current experimental information and current global mass models, we predict that $^{68}$Ca has an average posterior probability ${p_{ex}\approx76}$% to be bound to two-neutron emission while the nucleus $^{61}$Ca is likely to decay by emitting a neutron (${p_{ex}\approx 46}$ %).


Online Estimation of Multiple Dynamic Graphs in Pattern Sequences

arXiv.org Machine Learning

Many time-series data including text, movies, and biological signals can be represented as sequences of correlated binary patterns. These patterns may be described by weighted combinations of a few dominant structures that underpin specific interactions among the binary elements. To extract the dominant correlation structures and their contributions to generating data in a time-dependent manner, we model the dynamics of binary patterns using the state-space model of an Ising-type network that is composed of multiple undirected graphs. We provide a sequential Bayes algorithm to estimate the dynamics of weights on the graphs while gaining the graph structures online. This model can uncover overlapping graphs underlying the data better than a traditional orthogonal decomposition method, and outperforms an original time-dependent full Ising model. We assess the performance of the method by simulated data, and demonstrate that spontaneous activity of cultured hippocampal neurons is represented by dynamics of multiple graphs.


10 Major Machine Learning Algorithms And Their Application

#artificialintelligence

Algorithms are the smart and powerful soldier of a complex machine learning model. In other words, machine learning algorithms are the core foundation when we play with data or when it's come to training the model. In this article, you and I are going on a tour called "7 major machine learning algorithms and their application " The purpose of this tour is to either brush up the mind or to gain an essential understanding of machine learning algorithm. We will find the major answer in this tour like for what purpose machine learning algorithms works, where to use them, when to use them and how to use them. Before getting deeper let's have a brief introduction. Machine learning algorithms are mainly classified into 3 broad categories i.e supervised learning, unsupervised learning, and reinforcement learning. In supervised learning machine learning algorithms, the machine is taught by example. Here the operator provides the machine learning algorithm with the dataset. This dataset includes desired inputs and outputs variables. By the use of these set of variables, we generate a function that map inputs to desired outputs.


Calibration with Bias-Corrected Temperature Scaling Improves Domain Adaptation Under Label Shift in Modern Neural Networks

arXiv.org Machine Learning

Label shift refers to the phenomenon where the marginal probability p(y) of observing a particular class changes between the training and test distributions while the conditional probability p(x|y) stays fixed. This is relevant in settings such as medical diagnosis, where a classifier trained to predict disease based on observed symptoms may need to be adapted to a different distribution where the baseline frequency of the disease is higher. Given calibrated estimates of p(y|x), one can apply an EM algorithm to correct for the shift in class imbalance between the training and test distributions without ever needing to calculate p(x|y). Unfortunately, modern neural networks typically fail to produce well-calibrated probabilities, compromising the effectiveness of this approach. Although Temperature Scaling can greatly reduce miscalibration in these networks, it can leave behind a systematic bias in the probabilities that still poses a problem. To address this, we extend Temperature Scaling with class-specific bias parameters, which largely eliminates systematic bias in the calibrated probabilities and allows for effective domain adaptation under label shift. We term our calibration approach "Bias-Corrected Temperature Scaling". On experiments with CIFAR10, we find that EM with Bias-Corrected Temperature Scaling significantly outperforms both EM with Temperature Scaling and the recently-proposed Black-Box Shift Estimation.


Fitting A Mixture Distribution to Data: Tutorial

arXiv.org Machine Learning

This paper is a step-by-step tutorial for fitting a mixture distribution to data. It merely assumes the reader has the background of calculus and linear algebra. Other required background is briefly reviewed before explaining the main algorithm. In explaining the main algorithm, first, fitting a mixture of two distributions is detailed and examples of fitting two Gaussians and Poissons, respectively for continuous and discrete cases, are introduced. Thereafter, fitting several distributions in general case is explained and examples of several Gaussians (Gaussian Mixture Model) and Poissons are again provided. Model-based clustering, as one of the applications of mixture distributions, is also introduced. Numerical simulations are also provided for both Gaussian and Poisson examples for the sake of better clarification.


An intuitive guide to Gaussian processes – Towards Data Science

#artificialintelligence

Machine learning is using data we have (known as training data) to learn a function that we can use to make predictions about data we don't have yet. The simplest example of this is linear regression, where we learn the slope and intercept of a line so we can predict the vertical position of points from their horizontal position. This is shown below, the training data are the blue points and the learnt function is the red line. Machine learning is an extension of linear regression in a few ways. Secondly, modern ML uses much more powerful methods for extracting patterns of which deep learning is only one of many.


A combined entropy and utility based generative model for large scale multiple discrete-continuous travel behaviour data

arXiv.org Machine Learning

Generative models, either by simple clustering algorithms or deep neural network architecture, have been developed as a probabilistic estimation method for dimension reduction or to model the underlying properties of data structures. Although their apparent use has largely been limited to image recognition and classification, generative machine learning algorithms can be a powerful tool for travel behaviour research. In this paper, we examine the generative machine learning approach for analyzing multiple discrete-continuous (MDC) travel behaviour data to understand the underlying heterogeneity and correlation, increasing the representational power of such travel behaviour models. We show that generative models are conceptually similar to choice selection behaviour process through information entropy and variational Bayesian inference. Specifically, we consider a restricted Boltzmann machine (RBM) based algorithm with multiple discrete-continuous layer, formulated as a variational Bayesian inference optimization problem. We systematically describe the proposed machine learning algorithm and develop a process of analyzing travel behaviour data from a generative learning perspective. We show parameter stability from model analysis and simulation tests on an open dataset with multiple discrete-continuous dimensions and a size of 293,330 observations. For interpretability, we derive analytical methods for conditional probabilities as well as elasticities. Our results indicate that latent variables in generative models can accurately represent joint distribution consistently w.r.t multiple discrete-continuous variables. Lastly, we show that our model can generate statistically similar data distributions for travel forecasting and prediction.


Optimized Realization of Bayesian Networks in Reduced Normal Form using Latent Variable Model

arXiv.org Machine Learning

Bayesian networks in their Factor Graph Reduced Normal Form (FGrn) are a powerful paradigm for implementing inference graphs. Unfortunately, the computational and memory costs of these networks may be considerable, even for relatively small networks, and this is one of the main reasons why these structures have often been underused in practice. In this work, through a detailed algorithmic and structural analysis, various solutions for cost reduction are proposed. An online version of the classic batch learning algorithm is also analyzed, showing very similar results (in an unsupervised context); which is essential even if multilevel structures are to be built. The solutions proposed, together with the possible online learning algorithm, are included in a C++ library that is quite efficient, especially if compared to the direct use of the well-known sum-product and Maximum Likelihood (ML) algorithms. The results are discussed with particular reference to a Latent Variable Model (LVM) structure.